Upcoming SlideShare
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Standard text messaging rates apply

1,114
views

Published on

Part of the Search Engine course given in the Technion (2011)

Part of the Search Engine course given in the Technion (2011)

Published in: Technology

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total Views
1,114
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
20
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript

• 1. Computational advertising Kira Radinsky Slides based on material from the paper “Bandits for Taxonomies: A Model-based Approach” by Sandeep Pandey, Deepak Agarwal, Deepayan Chakrabarti, Vanja Josifovski, in SDM 2007
• 5. The Content Match Problem Advertisers Ads DB Ads Maximizing the number of clicks means:  For each webpage, find the ad with the best Click-Through Rate (CTR)  but without wasting too many impressions in learning this.
• 6. Outline Problem Background: Multi-armed bandits • Proposed Multi-level Policy • Experiments • Conclusions
• 7. Background: Bandits Bandit “arms” p1 p2 p3 (unknown payoff probabilities) Pull arms sequentially so as to maximize the total expected reward • Estimate payoff probabilities pi • Bias the estimation process towards better arms
• 8. Background: Bandits Solutions • Try 1: Greedy Solution: • Compute the sample mean of an arm A by dividing the total reward received from the arm by the number of times the arm has been pulled. At each time step choose the arm with highest sample mean. • Try 2: Naïve solution: • Pull each arm an equal number of times. • Epsilon-greedy strategy: • The best bandit is selected for a proportion 1 − ε of the trials, and another bandit is randomly selected (with uniform probability) for a proportion ε. • Many more strategies
• 9. Ad matching as a bandit problemWebpage1 Bandit “arms” Webpage2Webpage3 = ads ~106 ads ~109 pages
• 10. Ad matching as a bandit problem Ads Webpages Content Match = A matrix • Each row is a bandit • Each cell has an unknown CTR One instance of the MAB problem (1 bandit) Unknown CTR
• 11. Background: Bandits Bandit Policy 1.Assign priority to each arm 2.“Pull” arm with max priority, and observe reward 3.Update priorities Priority 1 Priority 2 Priority 3 Allocation Estimation
• 12. Background: Bandits Why not simply apply a bandit policy directly to the problem? • Convergence is too slow ~109 instances of the MAB problem(bandits), with ~106 arms per instance (bandit) • Additional structure is available, that can help  Taxonomies
• 13. Outline Problem Background: Multi-armed bandits Proposed Multi-level Policy • Experiments • Conclusions
• 14. Multi-level Policy Ads Webpages … … …… …… classes classes Consider only two levels
• 15. Multi-level Policy Apparel Compu- ters Travel … … …… …… Consider only two levels Travel Compu- tersApparel Ad parent classes Ad child classes Block One MAB problem instance (bandit)
• 16. Multi-level Policy Apparel Compu- ters Travel … … …… …… Key idea: CTRs in a block are homogeneous Ad parent classes Block One MAB problem instance (bandit) Travel Compu- tersApparel Ad child classes
• 17. Multi-level Policy • CTRs in a block are homogeneous – Used in allocation (picking ad for each new page) – Used in estimation (updating priorities after each observation)
• 18. Multi-level Policy • CTRs in a block are homogeneous Used in allocation (picking ad for each new page) – Used in estimation (updating priorities after each observation)
• 19. C A C T AT Multi-level Policy (Allocation) ? Page classifier • Classify webpage  page class, parent page class • Run bandit on ad parent classes  pick one ad parent class
• 20. C A C T AT Multi-level Policy (Allocation) • Classify webpage  page class, parent page class • Run bandit on ad parent classes  pick one ad parent class • Run bandit among cells  pick one ad class • In general, continue from root to leaf  final ad ? Page classifier ad
• 21. C A C T AT ad Multi-level Policy (Allocation) Bandits at higher levels • use aggregated information • have fewer bandit arms Quickly figure out the best ad parent class Page classifier
• 22. Multi-level Policy • CTRs in a block are homogeneous Used in allocation (picking ad for each new page) Used in estimation (updating priorities after each observation)
• 23. Multi-level Policy (Estimation) • CTRs in a block are homogeneous – Observations from one cell also give information about others in the block – How can we model this dependence?
• 24. Multi-level Policy (Estimation) • Shrinkage Model Scell | CTRcell ~ Bin (Ncell, CTRcell) CTRcell ~ Beta (Paramsblock) # clicks in cell # impressions in cell All cells in a block come from the same distribution
• 25. Multi-level Policy (Estimation) • Intuitively, this leads to shrinkage of cell CTRs towards block CTRs E[CTR] = α.Priorblock + (1-α).Scell/Ncell Estimated CTR Beta prior (“block CTR”) Observed CTR
• 26. Outline Problem Background: Multi-armed bandits Proposed Multi-level Policy Experiments • Conclusions
• 27. Experiments [S. Panday et al. 2007] Root 20 nodes 221 nodes … ~7000 leaves Taxonomy structure use these 2 levels Depth 0 Depth 7 Depth 1 Depth 2
• 28. Experiments • Data collected over a 1 day period • Collected from only one server, under some other ad-matching rules (not our bandit) • ~229M impressions • CTR values have been linearly transformed for purposes of confidentiality
• 29. Experiments (Multi-level Policy) Multi-level gives much higher #clicks Number of pulls Clicks
• 30. Experiments (Multi-level Policy) Multi-level gives much better Mean-Squared Error  it has learnt more from its explorations Mean-SquaredError Number of pulls
• 31. Conclusions • When having a CTR guided system, exploration is a key component • Short term penalty for the exploration needs to be limited (exploration budget) • Most exploration mechanisms use a weighted combination of the predicted CTR rate (average) and the CTR uncertainty (variance) • Exploration in a reduced dimensional space: class hierarchy • Top down traversal of the hierarchy to determine the class of the ad to show