1.
Computational advertising
Kira Radinsky
Slides based on material from the paper
“Bandits for Taxonomies: A Model-based Approach” by
Sandeep Pandey, Deepak Agarwal, Deepayan Chakrabarti,
Vanja Josifovski, in SDM 2007
2.
The Content Match Problem
Advertisers
Ads
DB
Ads
Ad impression: Showing an ad to a user
(click)
3.
The Content Match Problem
Advertisers
Ads
Ad click: user click leads to revenue for ad server and content provider
Ads
DB
(click)
4.
The Content Match Problem
Advertisers
Ads
DB
Ads
The Content Match Problem:
Match ads to pages to maximize clicks
5.
The Content Match Problem
Advertisers
Ads
DB
Ads
Maximizing the number of clicks means:
For each webpage, find the ad with the best
Click-Through Rate (CTR)
but without wasting too many impressions in
learning this.
7.
Background: Bandits
Bandit “arms”
p1 p2 p3
(unknown payoff
probabilities)
Pull arms sequentially so as to maximize the total
expected reward
• Estimate payoff probabilities pi
• Bias the estimation process towards better arms
8.
Background: Bandits Solutions
• Try 1: Greedy Solution:
• Compute the sample mean of an arm A by dividing the total
reward received from the arm by the number of times the arm
has been pulled. At each time step choose the arm with
highest sample mean.
• Try 2: Naïve solution:
• Pull each arm an equal number of times.
• Epsilon-greedy strategy:
• The best bandit is selected for a proportion 1 − ε of the trials,
and another bandit is randomly selected (with uniform
probability) for a proportion ε.
• Many more strategies
9.
Ad matching as a bandit problemWebpage1
Bandit “arms”
Webpage2Webpage3
= ads
~106 ads
~109
pages
10.
Ad matching as a bandit problem
Ads
Webpages
Content Match = A matrix
• Each row is a bandit
• Each cell has an unknown CTR
One instance of the MAB
problem (1 bandit)
Unknown CTR
11.
Background: Bandits
Bandit Policy
1.Assign priority to
each arm
2.“Pull” arm with
max priority, and
observe reward
3.Update priorities
Priority 1 Priority 2 Priority 3
Allocation
Estimation
12.
Background: Bandits
Why not simply apply a bandit policy
directly to the problem?
• Convergence is too slow
~109 instances of the MAB
problem(bandits), with ~106 arms per
instance (bandit)
• Additional structure is available, that
can help Taxonomies
14.
Multi-level Policy
Ads
Webpages
… …
……
……
classes
classes
Consider only two levels
15.
Multi-level Policy
Apparel
Compu-
ters Travel
… …
……
……
Consider only two levels
Travel
Compu-
tersApparel
Ad parent
classes
Ad child classes
Block
One MAB problem
instance (bandit)
16.
Multi-level Policy
Apparel
Compu-
ters Travel
… …
……
……
Key idea: CTRs in a block are homogeneous
Ad parent
classes
Block
One MAB problem
instance (bandit)
Travel
Compu-
tersApparel
Ad child classes
17.
Multi-level Policy
• CTRs in a block are
homogeneous
– Used in allocation (picking ad for
each new page)
– Used in estimation (updating
priorities after each observation)
18.
Multi-level Policy
• CTRs in a block are
homogeneous
Used in allocation (picking ad for
each new page)
– Used in estimation (updating
priorities after each observation)
19.
C
A C T
AT
Multi-level Policy (Allocation)
?
Page
classifier
• Classify webpage page class, parent page class
• Run bandit on ad parent classes pick one ad parent class
20.
C
A C T
AT
Multi-level Policy (Allocation)
• Classify webpage page class, parent page class
• Run bandit on ad parent classes pick one ad parent class
• Run bandit among cells pick one ad class
• In general, continue from root to leaf final ad
?
Page
classifier
ad
21.
C
A C T
AT
ad
Multi-level Policy (Allocation)
Bandits at higher levels
• use aggregated information
• have fewer bandit arms
Quickly figure out the best ad parent class
Page
classifier
22.
Multi-level Policy
• CTRs in a block are
homogeneous
Used in allocation (picking ad for
each new page)
Used in estimation (updating
priorities after each observation)
23.
Multi-level Policy (Estimation)
• CTRs in a block are
homogeneous
– Observations from one cell also
give information about others in
the block
– How can we model this
dependence?
24.
Multi-level Policy (Estimation)
• Shrinkage Model
Scell | CTRcell ~ Bin (Ncell, CTRcell)
CTRcell ~ Beta (Paramsblock)
# clicks in
cell
# impressions in cell
All cells in a block come from the same distribution
25.
Multi-level Policy (Estimation)
• Intuitively, this leads to shrinkage
of cell CTRs towards block CTRs
E[CTR] = α.Priorblock + (1-α).Scell/Ncell
Estimated
CTR
Beta prior (“block
CTR”)
Observed
CTR
27.
Experiments [S. Panday et al. 2007]
Root
20 nodes
221 nodes
…
~7000 leaves
Taxonomy structure
use these 2
levels
Depth 0
Depth
7
Depth 1
Depth 2
28.
Experiments
• Data collected over a 1 day period
• Collected from only one server, under some
other ad-matching rules (not our bandit)
• ~229M impressions
• CTR values have been linearly transformed for
purposes of confidentiality
29.
Experiments (Multi-level Policy)
Multi-level gives much higher #clicks
Number of pulls
Clicks
30.
Experiments (Multi-level Policy)
Multi-level gives much better Mean-Squared Error it has learnt
more from its explorations
Mean-SquaredError
Number of pulls
31.
Conclusions
• When having a CTR guided system, exploration is a
key component
• Short term penalty for the exploration needs to be
limited (exploration budget)
• Most exploration mechanisms use a weighted
combination of the predicted CTR rate (average) and
the CTR uncertainty (variance)
• Exploration in a reduced dimensional space: class
hierarchy
• Top down traversal of the hierarchy to determine the
class of the ad to show
Be the first to comment