A practical Bayesian approach to deriving search relevance judgments from web tracking - Slides from a talk at an OpenSource Connections Friday Share (https://opensourceconnections.com)
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Click Probability, Certainty and Context
1. Click Probability,
Certainty and Context
A practical Bayesian approach to deriving search
relevance judgments from web tracking
René Kriegler (@renekrie)
OSC Friday Share
12 February 2021
2. (c) René Kriegler, Click probability, certainty, context. 12 Feb 2021
Judgment modelling
Problem:
- How relevant is a given document for a given query?
Approach:
- Derive relevance judgments from observed user behaviour
3. (c) René Kriegler, Click probability, certainty, context. 12 Feb 2021
Judgment modelling: aspects
Model shall reflect these aspects:
- Click probability: a quantity that tells us that how likely it is that a given
document is relevant to the query
- Certainty: How sure can we be about the click probability? (few
observations, high variance -> uncertainty)
- Context: try to eliminate influence of position, device, grid size, ...
4. (c) René Kriegler, Click probability, certainty, context. 12 Feb 2021
Clicks first, conversions later
Philip Kotler, Kevin Lane
Marketing Management
1997
Peter Morville
Ambient Findability, 2005
Problem
recognition
Information
search
Evaluation
of
alternatives
Purchase
decision
Post-
purchase
behaviour
5-15 %
Most searches happen here, and most of them are
successful
5. (c) René Kriegler, Click probability, certainty, context. 12 Feb 2021
Clicks first, conversions later
Problem
recognition
Information
search
Evaluation
of
alternatives
Purchase
decision
Post-
purchase
behaviour
conversions
Most searches happen here, and many of them are
successful
6. (c) René Kriegler, Click probability, certainty, context. 12 Feb 2021
Let’s get started: CTR as probability
C: 1 if clicked, 0 if not
cqd
: observed clicks for query-document pair qd
vqd
: tracked views for query-document pair qd
Chuklin et al., Click Models for Web Search
7. (c) René Kriegler, Click probability, certainty, context. 12 Feb 2021
CTR as probability
} ?
} ?
} ?
8. (c) René Kriegler, Click probability, certainty, context. 12 Feb 2021
CTR as probability
} ?
Uncertainty! We
would expect the ctr
to change with the
next event that we
collect!
9. (c) René Kriegler, Click probability, certainty, context. 12 Feb 2021
Probability: from scalar to function
Model an expected probability as a function of
parameters alpha and beta...
10. (c) René Kriegler, Click probability, certainty, context. 12 Feb 2021
Probability: from scalar to function
Model an expected probability as a function of
parameters alpha and beta...
... and update this with the observed clicks and
views
11. (c) René Kriegler, Click probability, certainty, context. 12 Feb 2021
Probability: from scalar to function
E[ctr]=0.1 | alpha=1, beta=9
12. (c) René Kriegler, Click probability, certainty, context. 12 Feb 2021
Probability: from scalar to function
E[ctr]=0.1 | alpha=1, beta=9
Shrinkage towards the
expected value!
fewer views:
=> greater uncertainty
=> greater shrinkage
=> the more we rely on the
expected value
13. (c) René Kriegler, Click probability, certainty, context. 12 Feb 2021
Shrinkage vs. alpha and beta
The greater alpha and beta, the greater the shrinkage
=> we can model shrinkage together with E[ctr] by just using
alpha and beta
14. (c) René Kriegler, Click probability, certainty, context. 12 Feb 2021
alpha and beta
How can we find good values for
alpha and beta?
15. (c) René Kriegler, Click probability, certainty, context. 12 Feb 2021
Beta distribution: Beta(alpha,beta)
https://en.wikipedia.org/wiki/Beta_distribution
Expected value of Beta(alpha,beta):
16. (c) René Kriegler, Click probability, certainty, context. 12 Feb 2021
Beta(alpha,beta)
Method of moments to estimate alpha and beta for Beta(alpha,beta):
s2
: sample variance
(alpha + beta) ~ 1/s2
=> the lower the variance, the greater the certainty, the greater (alpha+beta), the
greater the shrinkage, the more we trust our expected ctr value, the more views we need to diverge
from it
alpha ~ probability of success, beta ~ probability of failure
We can write Beta(mean,variance)
17. (c) René Kriegler, Click probability, certainty, context. 12 Feb 2021
Welcome to Bayesian Modelling
Posterior
Prior Likelihood
function
Bayes’ Theorem with a ‘Beta prior’
18. (c) René Kriegler, Click probability, certainty, context. 12 Feb 2021
Context
How can we eliminate position
bias and other contexts from our
model?
We can solve click probability and certainty, but...
19. (c) René Kriegler, Click probability, certainty, context. 12 Feb 2021
Context: dealing with position bias
COEC - Clicks over expected clicks
r: rank at which clicks and views were observed
Same as above if query-doc pair was only observed at a
single rank
20. (c) René Kriegler, Click probability, certainty, context. 12 Feb 2021
Context: dealing with position bias
COEC - Clicks over expected clicks
Normalise by expected ctr for a given rank => great intuition!
Does not deal with ‘certainty’ => but we are good at doing this using the beta distribution to
express the expected value and the likelihood!
Cancels out rank specific relevance together with position bias => we’ll deal with this!
21. (c) René Kriegler, Click probability, certainty, context. 12 Feb 2021
Unpooled/per-rank parameters
Unpooled: each rank has its own expected value that reflects position bias and rank-specific
relevance => we cannot normalise by this expected value as it would remove relevance
information
22. (c) René Kriegler, Click probability, certainty, context. 12 Feb 2021
Pooled parameter estimation
Pooled: use a single set of parameters/expected value across all data points => we do not
remove relevance information but we also cannot handle position bias
23. (c) René Kriegler, Click probability, certainty, context. 12 Feb 2021
Partial pooling / hierarchical model
Partial pooling: each rank has its prior parameters (-
ctr-
r
, s2
r
) - like in the unpooled approach.
Now we assume a beta distribution of these priors with parameters µ (mean) and σ2
(variance)
Rank-specific priors modelled as Beta(µ,σ2
)
24. (c) René Kriegler, Click probability, certainty, context. 12 Feb 2021
Partial pooling / hierarchical model
Prior at rank r: Beta(θr
,τ2
r
)
remember that we can calculate (θr
,τ2
r
) -> (αr
,βr
)
Weighted average of the pooled mean µ and the
rank-specific mean ctr
Using variance as weight! - If the ctr varies a lot at
rank r - the pooled mean becomes more important
Keeps global relevance information
Models position bias
Estimation of µ and σ2
:
- bootstrapping
- samples weighted by views at r
- Gelman et al.: use 99th percentile variance
Per rank (unpooled)
‘Prior of priors’
25. (c) René Kriegler, Click probability, certainty, context. 12 Feb 2021
Aggregating across ranks
If a query-doc pair was observed at multiple ranks,
use the views-weighted rank-specific parameters
Do not attempt to aggregate the posteriors instead
of the priors (it took me 1.5 years to figure this out
;-) ) -> think: shrinkage per rank
Posterior:
(θqd
,τ2
qd
) -> (αqd
,βqd
)
26. (c) René Kriegler, Click probability, certainty, context. 12 Feb 2021
From ranks to contexts
Solution:
● do not assume any relationship between ranks - results are not processed in a known
sequence
● Unpooled prior applies to a context j
desktop_2_1_5_9_false: device=desktop, page=2, results-per-row=5, position=5,
results-shown-on-page=9, has-marketing-label=false
27. (c) René Kriegler, Click probability, certainty, context. 12 Feb 2021
Aggregating across contexts
Simply replace rank r with context j
If you don’t have observations for a given context,
you can fall back to using just (µ, σ2
)
28. (c) René Kriegler, Click probability, certainty, context. 12 Feb 2021
How good are my parameters?
Binomial distribution f(p,k,n)
k: clicks, n: views
P: (scalar) prior probability
Beta-Binomial distribution f(α,β,k,n)
k: clicks, n: views
Beta(α,β) as prior
=> max likelihood for clicks & views
from test set
29. (c) René Kriegler, Click probability, certainty, context. 12 Feb 2021
And the judgment?
Posterior:
(θqd
,τ2
qd
) -> (αqd
,βqd
)
Judgment:
(Which is posterior probability over expected
value)
30. (c) René Kriegler, Click probability, certainty, context. 12 Feb 2021
Next: Combine w/ CRO
Conversion probabilities can be modelled the same way, i.e. using a hierarchical model with
context-specific parameters:
P(O=1) is the probability of a purchase order
Model P(C=1) and P(O=1) separately and multiply. You can add a weights to P(C=1) vs P(O=1)
(Hypothesis: P(O=1) will be equal or close to its prior in many cases.)
31. (c) René Kriegler, Click probability, certainty, context. 12 Feb 2021
Thank you!