Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Ad science bid simulator (public ver)
1.
2. Outline
● Tradeoff between Online/Offline evaluation
● Simulator Architecture
○ Environment (traffic & auction)
○ Controller (algorithms)
○ Algorithms
● Data
○ How do we simulate auction
○ Example results
● Results
○ How do we evaluate
○ The insights it brings us
● Appendix (details for adScience folks)
3.
4. Scope difference: indeed adSystem v.s. DSP only
● Note that our scope is pretty different from DSP companies out there.
When you own the whole system (bid+auction)
Your bid algorithm is for every ads
=> pursue “smoothly make 100% use of client budget”.
=> low bid deviation, linear pacing is virtue here.
=> algorithms changing rapidly may lead to unpredictable system.
When you are a DSP company
You only care ads controlled by you
=> you can never “change the rule of whole world”.
=> pursue “beat competitor’s algorithms”.
=> Agile algorithms may take benefit.
=> CPC/CPA is the most important thing.
5%
Your bidding
“Other Bidders”
95%
Your own
Auction System
100%
5. ● Test groups are NOT independent, they join same auctions.
○ It's an artificial world, the “environment” is changing as you apply new alg
● Ideally, to make result comparable, need to run both new/old alg 100% on same time & ads => impossible in real world.
● Simple example: an algorithm always do max_bid in last 3 days, might work for 5%, not 100%.
When 100% on product, the “world” has changed
You want an alg can compete with itself, not an alg taking
advantage from old one.
During A/B test
new alg is competing with old alg, you’re testing whether it
beat the “old world” formed by old alg.
Concern on online A/B test
5%
New Alg
Old Alg
95%
New Alg
100%
6. ● Market trend is volatile, and even worse -- the period is long.
● For job market, traffic/user behavior may different in long holidays, new year, graduation season.
=> test in January or June could lead to opposite conclusion. (ex: traffic pred / init bid past test)
=> it’s not feasible to do all A/B test an year long.
● We're not the only player.
● Our downstream product (auction/SERP) is evolving as well, as we’re doing the test.Deboost
● Closely affected by search result.
Concern on online A/B test
7. ● Lack of auction details for replaying the bidding history
○ Currently we only have the winning auction data (adid, revenue, position, time)
○ What we do not have:
a. There are multiple SJ slot in an auction per SERP, we don’t know the other winning prices
(2nd/3rd/4th…) in the same auction.
b. The real “auction utility” score components, and the user interests.
● The compromise we need to make:
○ We only take data from “top” auction position, to make CTR/price similar.
○ Ignore eCTR/eApplyRate, only bid by price.
● Fortunately, we still have the most important thing: bid and revenue (2nd price).
Concern on offline simulation
8.
9. ● Object: minimize customized/repeated logic, minimize interface.
○ Encapsulate bidding algorithm as “controller”
○ Campaign management / traffic pattern / auction shared as “environment” logic.
○ Simulate different traffic pattern/market competition by only changing environment.
○ Simulate different algorithms against various environments.
○ Idea: we cannot guarantee simulation = reality, but we could choose an algorithm
survive all kinds of extreme simulated environments.
Simulator Architecture - concept
10. campaign
state
● The real implementation:
Evaluator
Simulator Architecture - modules
Controllers
(stateless)
Environment
Auction
(stateless)
bid bid, spend_cap
Baseline
HighFreq
PID
RBO (A/B)
DDPG / PPO
logs
metrics!
Campaign
Management
(mgr. All states)
win auction,
click, revenue
11. Algorithms
● Baseline
○ The current (by Jan 2020) product algorithm.
○ The only algorithm update 3 times a day among algorithms we benchmarked.
● HighFreq
○ Main idea: raise the bid update frequency from 3 times a day to update per hour.
● PID
○ A classic control system template.
○ Decide next action by Proportional/Integral/Derivative components.
● RBO
○ An original design from adScience bidAlgo team.
○ Learn from ad’s own history.
12. new bid
● Baseline
1. main idea: compare “last period bid result” with “target spend”
2. tsr (target spend rate) = budget_left / inventory_left
3. asr (actual spend rate) = budget_used / inventory_used
4. new_bid = last_bid * (tsr / asr) (simplified version, for easy understanding)
Baseline Algorithm
budget
used
inventory
used
actual
spend_rate
budget
remaining
Inventory
remaining
target
spend_rate
last bid
(A) ideal target (B) last observation (C) calibration
13. ● PID
1. Main idea: decide the next change ratio by 3 components P+I+D
2. P (proportional) = p_ratio * (1 - actual_spend / target_spend)
3. I (integrational) = i_ratio * SUM(past k history)
4. D (derivative) = d_ratio * (last_error - last_last_error)
5. new_bid = old_bid * (1 + P + I + D)
PID controller
14. ● RBO
1. Main idea: spendRate = A * bid2
2. Learn A from history!
RBO (responsive bid optimizer)
Time
spend_rate_t1 spend_rate_t2 spend_rate_t3 spend_rate_t4 spend_rate_t5 spend_rate_t6 spend_rate_t7
bid_t1 bid_t2 bid_t3 bid_t4 bid_t5 bid_t6 bid_t7
Fit “A”
15. ● Other than main algorithm, some hidden gems actually improved a lot.
1. Budget smoothing / Stochastic bid update.
2. Separate daily budget into 3-segments guardrail helped a lot in pacing.
3. Overspend is directly cutoff by campaign management.
4. Danger-zone logic burn the remaining bit of budget in last days.
5. RBOB introduced variant length of bid-period to deal with low-traffic ads.
The secret sauce ...
16.
17. ● Ex: for historical adid=xxxxx, get impression / click data from day_start ~
day_end from IQL.
● Augment click count by assuming some non-clicked impressions are
clicked.
(since user interest is not in scope of bid-opt.)
Data: how do we use Ad History
clicks
Impressions
day_start ~ day_end
Historical Data
clicks
Historical Data w/ augmented
10x clicks (revenue)
Impressions
day_start ~ day_end
clicksclicksclicksclicksclicksclicksclicksclicksclicks
18. Simulator Environment Components
● Bidding history
○ Use historical ad data, so naturally we have all kinds of weird traffic patterns.
○ Have synthesized traffic option, but not using it since too ideal.
● Budget
○ Simulate on ads having different level of clicks.
○ Found that low-budget/traffic ads are the most challenging type.
● Market Competition
○ Monopoly / Oligopoly / Fair market
● Traffic
○ Use ideal / predicted / uniform traffic to dispatch budget, to investigate the impact of traffic
prediction.
● Initial bid
○ Simulate initial bid impact, to investigate how important initial bid is, to decide whether we need to
improve.
19. ● Real traffic and Predicted traffic
○ Idea: invest (spend budget) according to inventory (traffic), for optimize CPC.
○ Ex: invest $120 on a week, assuming traffic of each day is 2 in weekdays and 1 in weekend.
Data: how do we use traffic data
2 2 2 2 2 1 1
$20 $20 $20 $20 $20 $10 $10
traffic
budget
20. How to simulate bid-price competing
● For example, In “fair competing” setup, 10 simulated ads start with
■ budget = (total revenue of augmented clicks) / 10
■ Initial bid = 1.25 * first click revenue (historically, 1st bid ~= 1.25 x 2nd bid)
● 10 ads will have 10 bid price, plus 1 extra historical revenue as “bottom-line market price”.
■ To win the bid, need to beat price from other 9 simulated ads and “market price”.
■ If all ads bid-price lower than market price, nobody win.
■ If multiple top price, random choose one.
■ Charged with 2nd price.
Historical market price
Simulated Ads
Historical revenue $1.0,
as market bottom line
10 ads deliver 10 bids by their algorithm
[$0.5, $1.5, $7, $15 … ]
21. ● The simulation of competing is like this:
Remaining_budget
Ideally should all
go zero
Daily spend limit &
max-spend
guardrail trend
Example simulate result in macro view
Bid history of 10
simulated competing ads
Daily budget
depletion of
each ad
22. Ncns (no-click-no-spend)
signaltsr (true spend rate)
asr (actual spend rate)
● To find root cause for some phenomenon: dive deep into the inner control-signal of each algorithms.
● Ex: Baseline (current product) control signals like: tsr, asr, ncns_rounds, danger_zone, over_spend …
● Modify what we think fishy => simulate again => prove the root cause.
Example simulate result in micro view
Mode, over_spend,
danger_zone
Used budget
23. ● An example while we’re investigating various budget dispatching mode
○ Simulated 50 ads for each criteria combinations (algorithm / traffic / market competition / budget_depletion_mode / initial_bid_rto ...)
○ Use median rather than mean (sensitive to outlier) as aggregated matrics result.
○ Most of time focus on budget depletion rate & deviation, also CPC, also monitor whether other metrics got interesting trend.
Example of aggregated simulate metrics
24. How do we evaluate
● Input: bidding history
○ Use historical ad data, so naturally we have all kinds of weird traffic patterns.
● Simulation variants: most of time, benchmark by 3 main factors
1. Algorithms
○ Baseline / HighFreq / PID / RBO / RBOB
2. Update Frequency
○ Each algorithm have variants updating bid hourly / bi-hourly / 4 hours / 6 hours / 8 hours.
3. Traffic Level
○ Find low traffic (10-30 clicks) / median traffic (30-200 clicks) / high traffic (200~10000 clicks)
historical ads, random sample 500 ads each ,
○ Simulate based on their 2019-08-01 to 2019-08-15 historical data.
○ Augmented x10 clicks, dispatch budget to 10 ads in same algorithm competing each other.
● Output: metrics
○ Mainly budget depletion & CPC
○ Also look mean/median bid-price, CBP, uptime, bid volatility, avg daily bid depletion, avg daily
spend depletion.
25.
26. ● Baseline
○ Weak especially for low-traffic ads.
○ Relatively more sensitive to traffic prediction and initial bid error.
● HighFreqBid
○ Online A/B test twice (adScience / SMB team, each did once), concluded as no move-forward.
○ By simulation, found it boost price too fast (hourly) for low-traffic ads, causing high CPC.
● RBO
○ Used to be even worse than baseline for low-traffic ads (same reason with HighFreq), we
developed 2 solutions (RBO/RBOB) and fixed it by simulation.
○ Both versions having similar performance, one tend to spend early but with a bit higher CPC, one
tend to wait longer and spend in the end of campaign lifetime.
○ Robust against traffic prediction and initial bid error.
● PID
○ Parameters chosen to beat RBO in Aug 2019 end up losing in Nov 2019.
○ Parameter tuning is important.
The insight it brings us - bid-opt algorithms
27. ● Traffic Prediction
○ For US market, even we deprecate traffic prediction and use uniform traffic instead, for baseline
it’s a ~5% difference in budget depletion, for PID and RBO it’s ~1%.
● Initial Bid
○ ~3% different in budget depletion for Baseline with ads having 1 month lifetime.
○ Almost no impact for high frequency update algorithms like PID & RBO.
● Tilting Budget Pacing
○ Try different patterns to tilt budget pacing, for improving budget depletion.
The insight it brings us - upstream components