Click Probability, Certainty and Context

Click Probability,
Certainty and Context
A practical Bayesian approach to deriving search
relevance judgments from web tracking
René Kriegler (@renekrie)
OSC Friday Share
12 February 2021

(c) René Kriegler, Click probability, certainty, context. 12 Feb 2021
Judgment modelling
Problem:
- How relevant is a given document for a given query?
Approach:
- Derive relevance judgments from observed user behaviour

Judgment modelling: aspects
Model shall reflect these aspects:
- Click probability: a quantity that tells us that how likely it is that a given
document is relevant to the query
- Certainty: How sure can we be about the click probability? (few
observations, high variance -> uncertainty)
- Context: try to eliminate influence of position, device, grid size, ...

Clicks first, conversions later
Philip Kotler, Kevin Lane
Marketing Management
1997
Peter Morville
Ambient Findability, 2005
Problem
recognition
Information
search
Evaluation
of
alternatives
Purchase
decision
Post-
purchase
behaviour
5-15 %
Most searches happen here, and most of them are
successful

Clicks first, conversions later
Problem
recognition
Information
search
Evaluation
of
alternatives
Purchase
decision
Post-
purchase
behaviour
conversions
Most searches happen here, and many of them are
successful

Let’s get started: CTR as probability
C: 1 if clicked, 0 if not
cqd
: observed clicks for query-document pair qd
vqd
: tracked views for query-document pair qd
Chuklin et al., Click Models for Web Search

CTR as probability
} ?
} ?
} ?

CTR as probability
} ?
Uncertainty! We
would expect the ctr
to change with the
next event that we
collect!

Probability: from scalar to function
Model an expected probability as a function of
parameters alpha and beta...

Model an expected probability as a function of
parameters alpha and beta...
... and update this with the observed clicks and
views

E[ctr]=0.1 | alpha=1, beta=9

E[ctr]=0.1 | alpha=1, beta=9
Shrinkage towards the
expected value!
fewer views:
=> greater uncertainty
=> greater shrinkage
=> the more we rely on the
expected value

Shrinkage vs. alpha and beta
The greater alpha and beta, the greater the shrinkage
=> we can model shrinkage together with E[ctr] by just using
alpha and beta

alpha and beta
How can we find good values for
alpha and beta?

Beta distribution: Beta(alpha,beta)
https://en.wikipedia.org/wiki/Beta_distribution
Expected value of Beta(alpha,beta):

Beta(alpha,beta)
Method of moments to estimate alpha and beta for Beta(alpha,beta):
s2
: sample variance
(alpha + beta) ~ 1/s2
=> the lower the variance, the greater the certainty, the greater (alpha+beta), the
greater the shrinkage, the more we trust our expected ctr value, the more views we need to diverge
from it
alpha ~ probability of success, beta ~ probability of failure
We can write Beta(mean,variance)

Welcome to Bayesian Modelling
Posterior
Prior Likelihood
function
Bayes’ Theorem with a ‘Beta prior’

Context
How can we eliminate position
bias and other contexts from our
model?
We can solve click probability and certainty, but...

Context: dealing with position bias
COEC - Clicks over expected clicks
r: rank at which clicks and views were observed
Same as above if query-doc pair was only observed at a
single rank

Context: dealing with position bias
COEC - Clicks over expected clicks
Normalise by expected ctr for a given rank => great intuition!
Does not deal with ‘certainty’ => but we are good at doing this using the beta distribution to
express the expected value and the likelihood!
Cancels out rank specific relevance together with position bias => we’ll deal with this!

Unpooled/per-rank parameters
Unpooled: each rank has its own expected value that reflects position bias and rank-specific
relevance => we cannot normalise by this expected value as it would remove relevance
information

Pooled parameter estimation
Pooled: use a single set of parameters/expected value across all data points => we do not
remove relevance information but we also cannot handle position bias

Partial pooling / hierarchical model
Partial pooling: each rank has its prior parameters (-
ctr-
r
, s2
r
) - like in the unpooled approach.
Now we assume a beta distribution of these priors with parameters µ (mean) and σ2
(variance)
Rank-specific priors modelled as Beta(µ,σ2
)

Partial pooling / hierarchical model
Prior at rank r: Beta(θr
,τ2
r
)
remember that we can calculate (θr
,τ2
r
) -> (αr
,βr
)
Weighted average of the pooled mean µ and the
rank-specific mean ctr
Using variance as weight! - If the ctr varies a lot at
rank r - the pooled mean becomes more important
Keeps global relevance information
Models position bias
Estimation of µ and σ2
:
- bootstrapping
- samples weighted by views at r
- Gelman et al.: use 99th percentile variance
Per rank (unpooled)
‘Prior of priors’

Aggregating across ranks
If a query-doc pair was observed at multiple ranks,
use the views-weighted rank-specific parameters
Do not attempt to aggregate the posteriors instead
of the priors (it took me 1.5 years to figure this out
;-) ) -> think: shrinkage per rank
Posterior:
(θqd
,τ2
qd
) -> (αqd
,βqd
)

From ranks to contexts
Solution:
● do not assume any relationship between ranks - results are not processed in a known
sequence
● Unpooled prior applies to a context j
desktop_2_1_5_9_false: device=desktop, page=2, results-per-row=5, position=5,
results-shown-on-page=9, has-marketing-label=false

Aggregating across contexts
Simply replace rank r with context j
If you don’t have observations for a given context,
you can fall back to using just (µ, σ2
)

How good are my parameters?
Binomial distribution f(p,k,n)
k: clicks, n: views
P: (scalar) prior probability
Beta-Binomial distribution f(α,β,k,n)
k: clicks, n: views
Beta(α,β) as prior
=> max likelihood for clicks & views
from test set

And the judgment?
Posterior:
(θqd
,τ2
qd
) -> (αqd
,βqd
)
Judgment:
(Which is posterior probability over expected
value)

Next: Combine w/ CRO
Conversion probabilities can be modelled the same way, i.e. using a hierarchical model with
context-specific parameters:
P(O=1) is the probability of a purchase order
Model P(C=1) and P(O=1) separately and multiply. You can add a weights to P(C=1) vs P(O=1)
(Hypothesis: P(O=1) will be equal or close to its prior in many cases.)

Thank you!

Click Probability, Certainty and Context

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Click Probability, Certainty and Context

Similar to Click Probability, Certainty and Context (10)

Recently uploaded

Recently uploaded (20)

Click Probability, Certainty and Context