Tutorial 11 (computational advertising)

1,456 views

Published on

Part of the Search Engine course given in the Technion (2011)

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,456
On SlideShare
0
From Embeds
0
Number of Embeds
49
Actions
Shares
0
Downloads
24
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Tutorial 11 (computational advertising)

  1. 1. Computational advertising Kira Radinsky Slides based on material from the paper “Bandits for Taxonomies: A Model-based Approach” by Sandeep Pandey, Deepak Agarwal, Deepayan Chakrabarti, Vanja Josifovski, in SDM 2007
  2. 2. The Content Match Problem Advertisers Ads DB Ads Ad impression: Showing an ad to a user (click)
  3. 3. The Content Match Problem Advertisers Ads Ad click: user click leads to revenue for ad server and content provider Ads DB (click)
  4. 4. The Content Match Problem Advertisers Ads DB Ads The Content Match Problem: Match ads to pages to maximize clicks
  5. 5. The Content Match Problem Advertisers Ads DB Ads Maximizing the number of clicks means:  For each webpage, find the ad with the best Click-Through Rate (CTR)  but without wasting too many impressions in learning this.
  6. 6. Outline Problem Background: Multi-armed bandits • Proposed Multi-level Policy • Experiments • Conclusions
  7. 7. Background: Bandits Bandit “arms” p1 p2 p3 (unknown payoff probabilities) Pull arms sequentially so as to maximize the total expected reward • Estimate payoff probabilities pi • Bias the estimation process towards better arms
  8. 8. Background: Bandits Solutions • Try 1: Greedy Solution: • Compute the sample mean of an arm A by dividing the total reward received from the arm by the number of times the arm has been pulled. At each time step choose the arm with highest sample mean. • Try 2: Naïve solution: • Pull each arm an equal number of times. • Epsilon-greedy strategy: • The best bandit is selected for a proportion 1 − ε of the trials, and another bandit is randomly selected (with uniform probability) for a proportion ε. • Many more strategies
  9. 9. Ad matching as a bandit problemWebpage1 Bandit “arms” Webpage2Webpage3 = ads ~106 ads ~109 pages
  10. 10. Ad matching as a bandit problem Ads Webpages Content Match = A matrix • Each row is a bandit • Each cell has an unknown CTR One instance of the MAB problem (1 bandit) Unknown CTR
  11. 11. Background: Bandits Bandit Policy 1.Assign priority to each arm 2.“Pull” arm with max priority, and observe reward 3.Update priorities Priority 1 Priority 2 Priority 3 Allocation Estimation
  12. 12. Background: Bandits Why not simply apply a bandit policy directly to the problem? • Convergence is too slow ~109 instances of the MAB problem(bandits), with ~106 arms per instance (bandit) • Additional structure is available, that can help  Taxonomies
  13. 13. Outline Problem Background: Multi-armed bandits Proposed Multi-level Policy • Experiments • Conclusions
  14. 14. Multi-level Policy Ads Webpages … … …… …… classes classes Consider only two levels
  15. 15. Multi-level Policy Apparel Compu- ters Travel … … …… …… Consider only two levels Travel Compu- tersApparel Ad parent classes Ad child classes Block One MAB problem instance (bandit)
  16. 16. Multi-level Policy Apparel Compu- ters Travel … … …… …… Key idea: CTRs in a block are homogeneous Ad parent classes Block One MAB problem instance (bandit) Travel Compu- tersApparel Ad child classes
  17. 17. Multi-level Policy • CTRs in a block are homogeneous – Used in allocation (picking ad for each new page) – Used in estimation (updating priorities after each observation)
  18. 18. Multi-level Policy • CTRs in a block are homogeneous Used in allocation (picking ad for each new page) – Used in estimation (updating priorities after each observation)
  19. 19. C A C T AT Multi-level Policy (Allocation) ? Page classifier • Classify webpage  page class, parent page class • Run bandit on ad parent classes  pick one ad parent class
  20. 20. C A C T AT Multi-level Policy (Allocation) • Classify webpage  page class, parent page class • Run bandit on ad parent classes  pick one ad parent class • Run bandit among cells  pick one ad class • In general, continue from root to leaf  final ad ? Page classifier ad
  21. 21. C A C T AT ad Multi-level Policy (Allocation) Bandits at higher levels • use aggregated information • have fewer bandit arms Quickly figure out the best ad parent class Page classifier
  22. 22. Multi-level Policy • CTRs in a block are homogeneous Used in allocation (picking ad for each new page) Used in estimation (updating priorities after each observation)
  23. 23. Multi-level Policy (Estimation) • CTRs in a block are homogeneous – Observations from one cell also give information about others in the block – How can we model this dependence?
  24. 24. Multi-level Policy (Estimation) • Shrinkage Model Scell | CTRcell ~ Bin (Ncell, CTRcell) CTRcell ~ Beta (Paramsblock) # clicks in cell # impressions in cell All cells in a block come from the same distribution
  25. 25. Multi-level Policy (Estimation) • Intuitively, this leads to shrinkage of cell CTRs towards block CTRs E[CTR] = α.Priorblock + (1-α).Scell/Ncell Estimated CTR Beta prior (“block CTR”) Observed CTR
  26. 26. Outline Problem Background: Multi-armed bandits Proposed Multi-level Policy Experiments • Conclusions
  27. 27. Experiments [S. Panday et al. 2007] Root 20 nodes 221 nodes … ~7000 leaves Taxonomy structure use these 2 levels Depth 0 Depth 7 Depth 1 Depth 2
  28. 28. Experiments • Data collected over a 1 day period • Collected from only one server, under some other ad-matching rules (not our bandit) • ~229M impressions • CTR values have been linearly transformed for purposes of confidentiality
  29. 29. Experiments (Multi-level Policy) Multi-level gives much higher #clicks Number of pulls Clicks
  30. 30. Experiments (Multi-level Policy) Multi-level gives much better Mean-Squared Error  it has learnt more from its explorations Mean-SquaredError Number of pulls
  31. 31. Conclusions • When having a CTR guided system, exploration is a key component • Short term penalty for the exploration needs to be limited (exploration budget) • Most exploration mechanisms use a weighted combination of the predicted CTR rate (average) and the CTR uncertainty (variance) • Exploration in a reduced dimensional space: class hierarchy • Top down traversal of the hierarchy to determine the class of the ad to show

×