Optimizing Budget Constrained Spend in Search Advertising


Published on

Search engine ad auctions typically have a signi cant frac-
tion of advertisers who are budget constrained, i.e., if al-
lowed to participate in every auction that they bid on, they
would spend more than their budget. This yields an im-
portant problem: selecting the ad auctions in which these
advertisers participate, in order to optimize di erent system

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Optimizing Budget Constrained Spend in Search Advertising

  1. 1. Optimizing Budget Constrained Spend in Search Advertising ∗ Chinmay Karande Aranyak Mehta Ramakrishnan Srikant chinmayk@fb.com aranyak@google.com srikant@google.com Google Research Mountain View, CA, USAABSTRACT Each advertiser also specifies a daily budget, which is anSearch engine ad auctions typically have a significant frac- upper bound on the amount of money they are prepared totion of advertisers who are budget constrained, i.e., if al- spend each day. While many advertisers use bids as the pri-lowed to participate in every auction that they bid on, they mary knob to control their spend and never hit their budget,would spend more than their budget. This yields an im- there exists a significant fraction of advertisers who wouldportant problem: selecting the ad auctions in which these spend more than their budget if they participated in ev-advertisers participate, in order to optimize different system ery auction that their keywords match. Search engines of-objectives such as the return on investment for advertisers, ten provide an option to automatically scale the advertiser’sand the quality of ads shown to users. We present a sys- bids [1, 2], but a substantial fraction of budget constrainedtem and algorithms for optimizing such budget constrained advertisers do not opt into these programs. For these ad-spend. The system is designed be deployed in a large search vertisers, the search engine has to determine the subset ofengine, with hundreds of thousands of advertisers, millions of auctions the budget constrained advertiser should partici-searches per hour, and with the query stream being only par- pate in. This creates a dependence between auctions ontially predictable. We have validated the system design by different queries, and leads to essentially a matching or animplementing it in the Google ads serving system and run- assignment problem of advertisers to auctions.ning experiments on live traffic. We have also compared our In this paper, we consider the problem of optimized budgetalgorithm to previous work that casts this problem as a large allocation: allocating advertisers to queries such that budgetlinear programming problem limited to popular queries, and constraints are satisfied, while simultaneously optimizing ashow that our algorithms yield substantially better results. specified objective. Prior work in this area has often chosen revenue as the objective to optimize. However, the long term revenue of a search engine depends on providing good value to users and to advertisers. If users see low quality ads,Categories and Subject Descriptors then this can result in ad-blindness and a drop in revenue.G.1.6 [Optimization]: Linear Programming; K.6.0 [General]: If advertisers see low return on investment (ROI), then theyEconomics will reduce their bids and budgets, again resulting in a drop in revenue. Thus, we explore two other objectives in this paper: improving quality, and advertiser ROI.1. INTRODUCTION Search ad auctions have emerged as the primary model for Paper Outline We describe the problem of optimized bud-monetizing the value provided by search engines. Advertis- get allocation in Section 2, followed by related work in Sec-ers use phrases (keywords) to specify the set of queries they tion 3. We present our algorithms and system design in Sec-are interested in, and bid the cost they are prepared to pay tion 4, and discuss certain key properties of the algorithm inper click on their ad. For each search query, the set of ads Section 5. In Section 6, we show that our algorithms yieldto show, the order in which they are shown, and the cost substantially better results than prior work and also describeper click for each shown ad are determined via an auction. the results of experiments on live traffic. We conclude with some closing thoughts in Section 7.∗The author is currently at Facebook, Inc., Menlo Park, CA,USA. The work described in this paper was done while theauthor was at Google. 2. PROBLEM DEFINITION Let A be the set of advertisers, and Q the set of queries.Permission to make digital or hard copies of all or part of this work for Each advertiser a ∈ A comes with a daily budget Ba . Letpersonal or classroom use is granted without fee provided that copies are G(A, Q, E) be a bipartite graph such that for a ∈ A andnot made or distributed for profit or commercial advantage and that copies q ∈ Q, edge (a, q) ∈ E means that an ad of a is eligible forbear this notice and the full citation on the first page. To copy otherwise, to the auction for query q (a’s keywords match q). Let ctr(a,q)republish, to post on servers or to redistribute to lists, requires prior specific be the probability of a click on a’s ad for q, and bid(a,q) bepermission and/or a fee.WSDM’13, February 4–8, 2013, Rome, Italy. the amount a is willing to pay per click. (Note that ctr(a,q)Copyright 2013 ACM 978-1-4503-1869-3/13/02 ...$15.00. is the position-independent CTR, i.e., the probability of a
  2. 2. click at some chosen fixed position. In other words, ctr(a,q) or advertiser clicks per dollar. Thus their approach yieldsdoes not depend on the position of the ad.) an optimal solution to the click maximization problem in When a query q arrives, the eligible ads a for q are ranked the general GSP setting. However, there are two reasonsby bid(a,q) ctr(a,q) and shown in that order. Denoting the why this work is not the final word on the allocation ap-jth ad in the order as aj , the cost per click of aj is set as proach. First, the LP can only run over head queries due to resource constraints, which brings up an interesting ques- cpc(aj ) = bid(aj+1 ,q) ctr(aj+1 ,q) /ctr(aj ,q) tion: Can a non-optimal algorithm that runs over the entireThis is known as the generalized second price (GSP) auction query stream beat an optimal algorithm that is restricted to(see, e.g., [25, 14, 6]). the head? Second, the LP formulation can yield solutions Let Ta denote the spend of the advertiser a if a partici- that are clearly unfair, in that the allocation for some adver-pates in all the auctions for which a is eligible via the key- tisers is very different from what the advertiser would chooseword match (ignoring a’s budget). If Ta > Ba , the advertiser for themselves for that objective (see Section 5). Hence itis budget constrained, and the search engine has to limit (or is unclear whether LP solutions can be deployed by searchthrottle) the set of auctions in which the advertiser partici- engines.pates. A second stream of work focused on optimizing search en- gine revenue in a first price auction. Mehta et al. [22] andObjectives. For budget constrained advertisers, the search Buchbinder et al. [9] both provide an online approximationengine can select to optimize different objectives such as: algorithm for optimizing revenue, with a best possible ap- • the quality of the ads shown, e.g., maximize the position- proximation guarantee of 1 − 1/e 63%, for the scenario in independent predicted click-through rate, which we do not know anything about the future distribu- • reduce advertiser cost-per-click (maximize the number tion of queries. In contrast, we assume that we can predict of clicks), or future distributions of queries, albeit noisily. Mahdian et al. [21] extended the algorithm from [22] to provide guaran- • increase advertiser ROI. tees in the setting when we have unreliable estimates. [19, We do not have full knowledge of the graph G, but only 13] analyzed revenue maximization in an average case set-information from past data, i.e., the graphs from previous ting where queries arrive in a random order, or are pickeddays. We also require practical algorithms which work on- i.i.d from a distribution. All the above papers focus solelyline, i.e., given the next query, decide, in sub-second re- on search engine revenue, assume a first price auction, andsponse time, which ad impressions to show. We can now do not consider multiple slots or positions. We focus on op-define the problem as follows. timizing very different objectives, such as user experience and advertiser ROI, in the second price GSP auction, withThe Optimized Budget Allocation Problem: Given multiple slots and positions.information about the past as G (A, Q , E ) including the There has been considerable related work in budget allo-bids and CTR for each advertiser-query pair, the budget cation for display ads, e.g., [11, 12, 15, 16, 26]. This workBa , for each advertiser a, and a specified objective function also does not consider second price auctions, or the objec-to optimize: For each query arriving in an online stream of tives we study.queries Q, decide which advertisers should participate in the There has also been work on designing new incentive com-auction. patible auctions in the presence of bidders with budget con- straints, initiated by [4, 8]. Our goal is to optimize budgets3. RELATED WORK in the context of the GSP auction used by search engines, There have been two broad approaches to optimizing bud- and hence we do not consider alternate auction designs.get constrained spend: allocation and bid modification. Al- Bid Modification The second approach [7, 10, 17, 20, 24]location treats bids as fixed, and allows only decisions about studies the advertiser’s problem of bidding optimally in or-whether the advertiser should participate in the auction. der to maximize ROI (or some notion of utility). ChangingThis is our setting, where we are constrained to not change bids in such a manner is an alternate solution to dealing withbids, but only optimize allocations. budget constrained advertisers: if the advertiser permits, the The second approach, bid modification, is in a setting search engine can scale bids down until the advertiser is nowhere bids can be changed. This body of work typically longer budget constrained. However, despite the availabilityconsiders the problem from the advertiser’s perspective, and of a free option to automatically scale bids [1], a substantialassumes full knowledge of the information that advertisers fraction of budget constrained advertisers have not opted intypically have (such as the value of clicks or conversions). to use this product. For these advertisers, the search engineHowever, this work can also be adapted to be applicable (as a policy decision) has to use the allocation approach,from a search engine’s perspective, for advertisers who have not bid modification. Hence the work on bid modification,opted in and allow the search engine to change their bids. while clearly an appealing alternative, is not applicable in We next describe the related work in the allocation and our setting.bid modification approaches. Despite the lack of applicability, it would be interesting toAllocation The paper by Abrams et al. [5] is the closest understand how optimized budget allocation fares againstto our work. They solve the offline problem (with complete bid scaling. We compare our algorithms to bid scaling inknowledge of future query frequency) for head queries (the Section 6.most popular search queries) in the GSP auction setting us-ing a linear program (LP), to optimize search engine revenue
  3. 3. 4. ALGORITHMS To achieve this goal, our algorithm uses a third input, Given the problem of optimizing budget constrained spend, the rank Rθ,a of an impression for a given advertiser a andthe first step is to neither over- nor under-spend.1 The naive metric θ. Defineway to do this is to let advertisers participate in auctions un- • Fθ,a (µ): Estimated fraction of maximum spend Ta fortil they hit their budget, and then make them ineligible for which θ(i) ≤ µ. In other words, F is the estimatedthe rest of the day. Clearly, this will yield very biased traffic cumulative distribution function of θ.to the advertiser, and also skew auction competition towardsthe earlier part of the day. The next simplest approach, • Rθ,a (µ) = 1 - Fθ,a (µ).which does not have these drawbacks, is Vanilla Probabilis- The lower the value of the rank R, the better the impressiontic Throttling (VPT). scores on our metric. For each advertiser a, define: We now define the Optimized Throttling algorithm: • Ba : the remaining budget for the day (or time period). For each arriving query q: • Ta : the remaining maximum spend for the rest of the For each budget constrained advertiser a: day, i.e., the total spend if the advertiser had unlimited If Rθ,a (θ(i)) ≤ Ba /Ta , then a budget. participates in the auction.We now define the Vanilla Probabilistic Throttling al-gorithm: Figure 2: Algorithm OT For each arriving query q: While the algorithm appears to be a straightforward greedy For each budget constrained advertiser a: algorithm, there are several subtleties: Flip a coin with P [Heads] = Ba /Ta . • The algorithm yields a solution that is “fair” to each If heads, a participates in the auction. advertiser (formally defined, and proved in Section 5). Figure 1: Vanilla Probabilistic Throttling (VPT) • The algorithm yields an optimal fair solution under certain constraints. While we can find many examples If our estimate of Ta is accurate, then each advertiser where the constraints don’t hold, the constraints holdspends very close to her budget in expectation (and with often enough that the algorithm is not too far fromhigh probability, by Chernoff bounds). Advertisers also re- optimal in practice.ceive a representative sample of the traffic they are eligible • We have transformed the domain from the space offor. queries to the distribution of some property of queries. Predicting the frequency of queries in the tail is in-4.1 Optimized Throttling tractable. Predicting the distribution of specific prop- We now present algorithms for optimizing one or more of erties of the queries in the tail is very tractable (forthe following objectives: properties we use in our algorithms). • average quality of ads shown, represented by CTR. • The choice of metric lets us optimize a wide range • clicks per dollar, of objectives, or combinations of objectives (discussed next). • conversions per dollar, We now define five instantiations of OT, corresponding • the advertiser’s profit, using the difference between the to the four objectives we listed earlier, and a fifth objective bid and the cost per click as the estimate of profit. that combines quality and clicks: For the chosen objective, given a candidate ad impression i(i.e., for a specific query and advertiser), we compute a met- Objective θ(i)ric θ(i) which tracks the desired objective. Given a choice OT-CTR Ad quality ctribetween two impressions (for the same advertiser), we would OT-Clicks Clicks 1/cpciprefer to show the impression with the higher value of the OT-Profit Profit (bidi − cpci )/cpcimetric. For example, if the objective is quality, the metric OT-Conversions Conversions cvri vali /cpcicould be the position-independent predicted click-through ctri (bidi − cpci ) OT-CTR-Profit Blendrate of the advertiser for this query, reflecting our desire to cpcishow higher-quality impressions. Conceptually, we wouldlike to rank all the impressions for an advertiser by the de- Figure 3: Instantiations of OTsired metric, and choose the impressions with the highestmetric score until the budget is filled. The first metric, ctri is straightforward: we are using1 position-independent predicted CTR as a proxy for qual- While most advertisers are clearly budget constrained or ity. (Of course, one could use any arbitrary quality metricunconstrained, advertisers at the margin may switch back- instead of CTR.)and-forth between the two states, based on traffic. It isstraightforward to handle these marginal cases. For ease of To understand the next three metrics, it is helpful toexposition, we ignore this issue in the paper; however our multiply the numerator and denominator by the position-implementation does handle these cases gracefully. dependent predicted CTR. For example, in OT-Clicks, the
  4. 4. numerator now becomes the expected number of clicks, and (purely for ease of exposition). Since a search engine has athe denominator the expected cost, and the ratio is the clicks large number of advertisers, we would like to compress theper dollar. Since the total spend is fixed for budget con- information in R at serving time. We compress R into astrained advertisers, optimizing clicks is equivalent to opti- histogram H.mizing clicks per dollar. 2 Recall that R is only used to answer the question of whether For OT-Profit, assuming that bidi is the value to the R(θ(i)) is less than ip (= Ba /Ta ). So if the estimate of ipadvertiser, bidi −cpci is the expected profit to the advertiser was very stable, we need just two buckets in the histogramif there is a click on this impression. So the metric for OT- H, with the boundary at the value c∗ such that R(c∗ ) = ip.Profit is the expected profit per dollar of spend. With two buckets, we would need just 8 bytes of data per For OT-Conversions the metric is the expected conver- budget constrained advertiser: the value c∗ and the value ofsion value per dollar, given the value of a conversion on this R(c∗ ).impression vali , and a model that predicts the conversion We may choose to create additional buckets around therate cvri . Building machine learning models for estimat- threshold c∗ , based on the tradeoff between increased gainsing cvri is beyond the scope of the paper. However, we from choosing the highest scoring impressions (see below)will note the existence of commercial systems that estimate versus memory constraints. For each bucket m in H ex-cvri given advertiser-specific conversion data, e.g., [2]. Ad- cluding the last bucket, we store the bucket boundary andvertisers do not necessarily need to provide conversion data the value of R at the bucket boundary. We refer to thesein order to benefit from the techniques in the paper. One histograms as throttling parameters.can build models for estimating conversion rate that are not Using the estimates at serving: When a query arrives,advertiser-specific, e.g., we found that ctri is correlated with we need to determine, for each budget constrained advertisercvri . a, whether a participates in the auction. The input consists The final metric simply multiplies the metric for CTR of the histogram H, the current value of ip, and the valueand profit. The intuition is that for advertisers with a lot of of θ(i) for the current impression i. Let θ(i) be in bucketvariance on CTR but not much on profit, the algorithm will m. Let Hb (m) denote the upper boundary of bucket m, andfocus on CTR. Similarly for advertisers with more variance let Hr (m) = R(Hb (m)). Then the advertiser participates inon profit than CTR, the algorithm will focus on profit. Thus the auction with the following probability:if the search engine cares about both CTR and advertiser if Hr (m) ≤ ip profit, the blended metric will likely yield better results than  1,simply averaging the results of the individual metrics. 0, if Hr (m − 1) ≥ ip ip−Hr (m−1) , otherwise  Hr (m)−Hr (m−1)4.2 System design and implementation The first two cases are straightforward, and follow directly We next describe our implementation of the algorithm in from the goal that we (do not) show the impression if it isthe production Google ads serving system. Our system has (not) in the top ip fraction of spend. In the third case, thethree primary components: probability that the impression is in the top ip fraction of • estimating Ba /Ta , spend is given by (ip − Hr (m − 1))/(Hr (m) − Hr (m − 1)), and hence we show the impression with that probability. • estimating and compressing Rθ,a , and Implementation: We have built our data collection pipeline • using the estimates at serving time. on top of Google’s sawzall [23] infrastructure, which allowsWe will use OT-CTR to illustrate the techniques used, and us to process historical query data in parallel. The throt-discuss any differences between OT-CTR and the other in- tling parameters generated by the data collection pipeline isstantiations as they come up. written to the Google File System (GFS) [18]. The data is stored in the protocol buffer format [3], which reduces theEstimating Ba /Ta : This component estimates the impres- storage requirement as well as make the transfer and pro-sion probability ip, where ip is defined as Ba /Ta . In other cessing of the data more efficient. From GFS, the throttlingwords, ip is the probability with which we should allow an parameters data is picked up by the ads data push system,impression of the advertiser to participate in an auction, which writes it to one of its data channels. The ads servingin order to show her impressions uniformly through the re- system gets the updated throttling parameters through thismainder of the day, and exhaust her budget at the end of the channel.day. We estimate ip using traffic information from the past,and using the available budget. Given the inherent noise in Interactions between budget constrained advertis-traffic, a feedback loop is used at frequent intervals to adjust ers: Out of the metrics in Figure 3, ctri , bidi , cvri andthe probability. Clearly, the more accurate the estimate of vali are all functions purely of the impression, and do notip, the more the gains from optimization. change based on the other ads in the auction. However, cpci is a function of the runner-up, since we use a second-Estimating and compressing Rθ,a : We use historical price auction. We analyzed the logs, and found that budgetdata to compute the cumulative distribution function Fθ,a . constrained ads are more likely to be next to budget uncon-Rθ,a is a trivial transformation of Fθ,a . In the rest of this strained ads than budget unconstrained ads. However, wesection, we will drop the subscripts and refer to Rθ,a as R will have instances with consecutive budget constrained ads.2 Some metrics like ctri are purely a function of the impres- We use an iterative technique for improving the performancesion. However, metrics like cpc may change based on the of the online algorithms in these cases.other ads in the auction. We discuss this issue in Section 4.2. The intuition behind the iterative technique is to run sev-
  5. 5. eral (simulated) iterations of the auction. In each iteration,we compute the metric θ for each budget constrained adver- • Rank the impressions i ∈ Ea in order of decreas-tiser (including those that do not participate in the auction ing 1/cpci .in the current iteration), and based on the value of the met- • Pick the top impressions in Ea according to thisric, decide whether that advertiser participates in the next ranking until the budget runs out, i.e. the largestiteration. Note that an impression may be removed in one prefix Ia , s.t. i∈Ia spendi ≤ Ba , and at mostiteration and re-enter in a subsequent iteration. While we one additional fractional impression to finish thecannot prove convergence, in practice this often converges budget.in a few rounds, or at least leads to an improved solutionover simply using the value of θ from the first iteration. Figure 4: Offline-OT-Clicks-Single-Advertiser5. KEY PROPERTIES OF ALGORITHM The overall effectiveness of the algorithm depends on two Without fractional impressions, this is the integral knap-factors: the accuracy of the prediction of future traffic (total sack problem. Since one click is a tiny fraction of adver-traffic and the distribution of the metric), and the intrinsic tiser spend, we allow the choice of one fractional impres-effectiveness of the algorithm. To understand the latter, we sion, thereby converting the problem to a fractional knap-analyze the algorithm on the offline version of the problem, sack problem. Observing that spendi = αi ctri cpci andin which the advertiser-query graph is known. We will focus therefore αi ctri /spendi = 1/cpci , we get the algorithm inon the instantiations with a linear objective: OT-Clicks, Figure 4, which is a simple greedy algorithm, using the ratioOT-Profit and OT-Conversions. For non-linear objec- of the expected value from the click to the cost of the click.tives such as CTR, optimality with even a single budget con- Theorem 1 follows from the well known optimality of thestrained advertiser may require allocation in a manner that greedy strategy for the fractional knapsack problem, and itsis clearly against the advertiser’s interest. So algorithms like proof is omitted here.OT-CTR which are designed to both be good for advertisersand improve quality cannot be optimal. Theorem 1. Offline-OT-Clicks-Single-Advertiser com- We first consider the special case of a single budget con- putes an optimal solution to the ROI maximization problemstrained advertiser, and show that our algorithm is optimal for a single budget constrained advertiser.for linear objectives. We then discuss the issue of fairnesswhen there are multiple budget constrained advertisers, and 5.2 Fair Allocationsshow by example that linear programming can yield solu- We begin with the following definition of an optimal allo-tions that are optimal but not fair. When we restrict the cation:space of solutions to fair solutions, we show that our algo- Definition 1. We call an allocation optimal if it maxi-rithm yields an optimal solution even with multiple budget mizesconstrained advertisers, as long as there are no adjacent bud-get constrained advertisers in a given auction. Obviously, a∈A wa i∈Ia αi ctriwe do get adjacent budget constrained advertisers in the a∈A wareal world – but the optimality result with that constraint over all possible allocations (given the wa ≥ 0, which aresuggests that the algorithm will perform well in practice. arbitrary advertiser specific weights). For ease of exposition, we will choose the simplest instan-tiation, OT-Clicks as the representative algorithm in the However, this definition has the problem that in trying toproofs. It is straightforward to sketch out similar proofs for maximize the weighted average of advertiser ROI, we mayOT-Profit and OT-Conversions. end up sacrificing the interests of some advertisers, as the following example illustrates.5.1 Optimal For A Single Budget Constrained Advertiser Example 1. There are two budget constrained bidders, We start by proving that given G(A, Q, E) and a single a and b, each with a budget of $100. There are two differentbudget constrained advertiser a ∈ A, OT-Clicks maximizes queries q1 and q2 , each with 100 instances. q1 has a mini-clicks per dollar for a. While the proof is obvious, it is useful mum reserve cpc of $1, and q2 has a reserve of $2 (one mayas a building block to more interesting results. replace the reserves by an unconstrained bidder, keeping the Given an advertiser a, we define: example unchanged). The bidders bid the following values • Ea to be the set of impressions in queries where a is for the queries (a does not bid on q2 ): eligible to participate in the auction, and • Ia ⊆ Ea to be the set of impressions where a partici- q1 q2 pates in the auction. a 20 −The total expected clicks is i∈Ia αi ctri , where αi is the b 10 10position normalizer. For fixed budget, maximizing clicks min 1 2per dollar is the same as maximizing total clicks, which iscaptured by the following linear program: Given a, find For ease of exposition, we assume that the CTR is equal Max αi ctri , s.t. spendi ≤ Ba (1) for all advertisers and query pairs, in both positions. To Ia ⊆Ea maximize total clicks, or equivalently, clicks per dollar, the i∈Ia i∈Ia
  6. 6. optimal solution is to let a participate in q1 , and b in q2 .Then a gets 100 clicks and b gets 50, giving a total of 150 1. Begin with an allocation with only budget uncon-clicks at a cost-per-click of $1.33. But this solution is not strained advertisers.fair to b, who would rather show for q1 and get a cpc of $1 2. For each budget constrained advertiser a ∈ A (inand hence 100 clicks instead of 50. In this scenario a would turn):get only 10 clicks at a cpc of $10, giving a total of 110 clicks –Run Offline-OT-Clicks-Single-Advertiser for a.at an average cpc of $1.81. 2 –Update I. Example 1 motivates the following definition: 3. If the allocation has not converged, go to step 2. Definition 2. We define an allocation I ⊆ E to be a fairallocation for the clicks objective, if ∀ a ∈ A: Figure 5: Algorithm Offline-OT-Clicks. a. The expected spend of a is equal to Ba (a exhausts its budget), or Ia = Ea (we show every impression of a it due to the insertion or removal of other budget constrained is eligible for). impressions). As an aside, this key property also holds for first price auctions, and the following results also hold for b. Given the allocation of other advertisers (i.e. IIa ), Ia first price auctions. maximizes the total expected clicks that a can obtain Given this property, for each advertiser a, the 1 / cpc or- within its budget. dering of the impressions in Ea is fixed throughout the al-We call an allocation I an optimal fair allocation if it is gorithm, independent of the allocations of the other adver-a fair allocation, and it maximizes tisers. From the definition, it is easy to see that every fair allocation has the property that for each a ∈ A, Ia is a prefix a∈A wa i∈Ia αi ctri of this fixed ordering of Ea , since a non-prefix would violate a∈A wa Property (b) in Definition 2. By definition of the algorithm,among all fair allocation (where the wa ≥ 0 are arbitrary this is true also for the allocations produced during everyadvertiser specific weights). step of the algorithm. We call allocations with this property prefix-allocations. We can similarly define fair allocations and optimal fairallocations for the profit objective and the conversions ob- Lemma 2. Algorithm Offline-OT-Clicks converges to ajective. fair allocation if there are no adjacent constrained bidders In Example 1, the allocation which maximizes the sum in any query.(giving 150 clicks) is not a fair allocation, while allocating Proof. Fix an advertiser a, and consider the allocationboth a and b to q1 is (even though this reduces the total to a in each round. We claim that the prefix chosen forclicks to 110). We note that our definition of fair allocation a only increases in length in subsequent rounds. Since foris analogous to that of a Nash equilibrium in games. each advertiser the prefix cannot increase indefinitely, this means that the algorithm converges to some allocation, say5.3 Optimal Fair Allocation When No Adja- I ∗ . Property (a) in Definition 2 holds for I ∗ because of the cent Budget Constrained Advertisers termination condition of Offline-OT-Clicks. Property (b) When there are multiple budget constrained advertisers, holds by definition of Offline-OT-Clicks-Single-Advertiser,a natural local algorithm is to cycle over the different ad- which is run in every round of Offline-OT-Clicks.vertisers until convergence, running Algorithm Offline-OT- It remains to prove the claim that the prefix for a can onlyClicks-Single-Advertiser for each advertiser a. In the re- increase in each round. Suppose this is true up to the timemainder of this section we analyze this algorithm, which is we process advertiser a in round k. In between the timesdefined in Figure 5. we process a in rounds k and k + 1, the algorithm may have In a GSP auction, there are two ways in which the intro- introduced impressions of other advertisers in the auctionsduction or removal of an impression i of one budget con- for which we show a’s impressions in round k. The onlystrained advertiser can effect an impression i of another: effect this can have on a’s impression is to possibly lower its position, and therefore of its expected spend. Thus, in • i can change the cpc of i round k + 1, the algorithm may need to pick a longer prefix to finish a’s budget. • i can change the position of i , and hence the expected spend from i . For two prefix-allocations I, J, we say I J if for every advertiser a ∈ A, the prefix length in I is at most the prefixDue to this interaction, it is unclear whether Algorithm length in J, and therefore Ia ⊆ Ja .Offline-OT-Clicks always yields an optimal fair allocation.However we will show that it does converge to a optimal fair Lemma 3. Let I ∗ be the fair allocation that Algorithmallocation when there are no adjacent budget constrained Offline-OT-Clicks converges to (when there are no adja-bidders, i.e., in the auction ranking of all eligible bidders, cent constrained bidders). Thenthere are no consecutive budget constrained bidders. The I∗ I, for all fair allocations Ikey property is that in such a scenario, the cpc of a budgetconstrained impression is independent of other budget con- Proof. If this is not true for some fair allocation I, thenstrained impressions (even though the position may change consider the first time during the run of the algorithm that
  7. 7. some advertiser a’s prefix becomes longer than its prefixin I. Comparing to I, the algorithm’s current allocation max αsi ctrqsi xqshas all advertisers a = a with smaller or equal prefixes. q,s,iThus the position normalizers of a’s impressions are largeror equal during this step of the algorithm than in the al- Allocation constraint: xqs ≤ Nq ∀qlocation I. This implies that the prefix of a in the current sallocation should be shorter or equal than that in I, since Budget constraint: cpcqsi αsi ctrqsi xqs ≤ Bi ∀ithe expected spend in each auction is at least that in I, a q,scontradiction. One can optimize for other linear metrics, such as rev-Note that Lemma 3 implies that there is a unique -minimal enue by suitably changing the objective function. It is notfair allocation, and that the algorithm converges to it. possible to directly optimize for non-linear metrics such as CTR. Theorem 4. Algorithm Offline-OT-Clicks converges to The advantage of LP (compared to our approach) is thatan optimal fair allocation if there are no adjacent constrained the LP fully incorporates interactions between different bud-bidders in any query. get constrained advertisers. However, even the most efficient Proof. From Lemma 2 we know that the algorithm con- LP solvers cannot solve linear programs on the volume ofverges to a fair allocation I ∗ . From Lemma 3 we get that data a search engine sees in a day. Thus we have to limitfor every advertiser a, and every fair allocation I, a’s pre- LP to the head portion of query traffic, and use an approachfix in I ∗ is no longer than its prefix in I. Thus a spends such as Vanilla Probabilistic Throttling on the long tail.the same amount of money (in expectation) in I ∗ as in I,but spends it on a subset (I ∗ )a ⊆ Ia such that impressions 6.1.2 Bid Scalingin Ia (I ∗ )a have lower (or equal) ratios of value of click to Both the OT algorithms and the LP formulation work un-cost of click, as impressions in (I ∗ )a . This implies that a der the constraint that the bidder’s inputs (bids) can not begets at least as much total expected value in I ∗ as in I, in changed by the search engine, but they can only be throt-turn implying that I ∗ maximizes any weighted average (over tled. Bid Scaling algorithms go outside this design space byadvertisers) of the total expected value obtained, among all lowering the bids of the constrained bidders appropriately.fair allocations. As we discussed in Section 1, bid scaling is not an option for our problem. Nevertheless, it is interesting to compare our As mentioned earlier, it is easy to prove similar statements approach against bid scaling.about other linear objectives such as optimizing profit and Our bid scaling algorithm finds one bid multiplier per bid-optimizing conversions. der and applies it to all the advertiser’s bids, similar to the Budget Optimizer product in Google Adwords [1]. The mul-6. EMPIRICAL EVALUATION tiplier is calculated so as to spend exactly the bidder’s bud- We report two types of experiments below, offline simu- get.lations to compare different optimized allocation solutions,and live experiments on Google traffic. We start with a 6.2 Simulation Methodologydescription of prior approaches: LP and BidScaling. We conducted simulations using a 20% sample of all US queries made over a week to the Google search engine. We6.1 LP and BidScaling sorted queries by the total impressions for that query (summed over all query instances), and picked the queries with the6.1.1 Linear Programming most impressions as the “head” queries for the LP. The num- Assuming complete knowledge of the data, a theoretical ber of queries in the head was chosen so that the LP couldbenchmark for any budget allocation algorithm can be ob- run in memory. (Note that as we move from the head to thetained via linear programming [5]. For any query q, let s tail, each query has relatively few instances. So even dou-denote the set of bidders chosen to participate in the auction bling the memory will not substantially increase the fractionby an allocation mechanism. Let C(q) be the collection of all of revenue or clicks covered by the “head” queries.) For eachsuch sets, generated by any conceivable algorithm. Clearly, set (head and tail), we computed an appropriate budget forwe can completely specify an allocation policy by specifying that set by scaling down the total budgets. For each queryfor each query q, the set s ∈ C(q) of bidders permitted by we also have the candidate ads together with the relevantthe algorithm to participate in the auction for q. metrics, namely, the bid and the predicted CTR. Our sim- We can then attempt to discover the best allocation policy ulation is offline, i.e., the set of queries and candidates is(for a given linear objective, say click maximization) using fixed. Since all the algorithms we simulate are time inde-linear programming. Let Nq be the number of times query q pendent (as opposed to some of the bid scaling algorithmsappears in the data and xqs be the number of times the set studied earlier, e.g., [22, 9, 13, 16, 15]), we do not need tos of bidders is selected for query q. For a bidder i ∈ s, let worry about the time-arrival order of the queries.cpcqsi and ctrqsi be the cost-per-click and click-through rate We simulated the following throttling algorithms for ourfor i in the auction for q. Let αsi be the position normalizer first set of comparisons:for impression i. Let Bi be the budget of bidder i. Then, 1. VPT: Vanilla Probabilistic Throttling.we can discover the best allocation policy that maximizestotal clicks provided to budget constrained advertisers as 2. OT-Clicks: Optimized Throttling, objective is clicksthe solution of the following linear program: (or inverse of cpc).
  8. 8. 3. OT-CTR: Optimized Throttling, objective is CTR. 4. BidScaling, as described in Section 6.1.2 5. LP-Clicks: Click maximizing Linear Program on head queries.6.3 Results We show the changes in various objectives relative to thebaseline of Vanilla Probabilistic Throttling (VPT). It is im-portant to note that while we expect the overall conclusionsto carry over to an online setting where the query distribu-tion changes over time, the exact numbers will change. Ingeneral, the gains from optimized budget allocation or bidscaling will be significantly lower in live experiments due tochanges in query traffic. For this reason, as well as data Figure 6: Impact on clicks-per-dollar, over budgetconfidentiality, we omit the scale from our graphs below. constrained campaigns. The baseline is VPT.6.3.1 Comparison with LP Figure 6 shows the change in clicks per dollar for budget point, all figures represent performance using all queries,constrained advertisers for each of the algorithms. The first not just the head.) Here OT-Clicks does much worse thanset of numbers, “head”, show the results when we artificially BidScaling, though it is still positive. In fact, even OT-restrict all the algorithms to operate over the same set of CTR beats OT-Clicks. OT-Profit, which attempts tohead queries as LP-Clicks, with VPT on the tail. Since optimize profit, yields similar results to BidScaling.LP-Clicks is not just optimal, but can also generate solu- As discussed in the introduction, user experience (qual-tions that are not fair (unlike the other algorithms), it is not ity of ads) is as important as advertiser ROI for long-termsurprising that LP-Clicks outperforms the alternatives. success. At first glance, one might expect that optimizing However, when we allow the algorithms to optimize over clicks per dollar would yield similar results to optimizingthe entire dataset – the “all” numbers – the algorithms that CTR: doesn’t a higher click-through rate mean more clicks?can use the full data dramatically outperform LP-Clicks. However, what matters for optimizing clicks per dollar (withIn fact, even OT-CTR, which is optimizing CTR and not a fixed budget) is the cost per click, not the click-throughCPC, yields a higher drop in CPC (or equivalently, more rate.clicks per dollar) than LP-Clicks. The reason for the poor Figure 8 shows the change in CTR for each of the al-performance of LP-Clicks is that the LP can be run only gorithms. OT-CTR dramatically outperforms all other al-on the head, and even though the head queries account for gorithms, not surprising since it is the only algorithm ex-a substantial portion of revenue, they are relatively homo- plicitly trying to optimize quality. Interestingly, while bothgeneous – the potential gains from optimization are more OT-Profit and BidScaling gave similar gains in profit-in the tail than the head. We found that this held for the per-dollar, their effect on quality is quite different: OT-other metrics as well, i.e., the substantial majority of the Profit improves CTR, while BidScaling reduces CTR.gains from optimization came from the tail queries.3 6.3.3 Multiple Objectives6.3.2 Comparison with BidScaling We now present results with metrics that blend the CTR The other interesting comparison in Figure 6 is between and profit objectives. In Section 4 we had conjectured thatOT-Clicks and BidScaling. BidScaling performs slightly blended metrics might yield better results than individualbetter than OT-Clicks when restricted to head queries, as metrics, since different advertisers may have better scopemany advertisers may appear for a relatively small number for optimization along different dimensions. Figures 9 andof queries in the head. Thus OT-Clicks, which doesn’t have 10 show the impact of two blended metrics: ctr(bidi −the flexibility to scale bids, has a bit less room to maneu- cpci )/cpc, and ctr2 (bidi − cpci )/cpc. Notice that OT-ver. Over all queries, OT-Clicks has much more scope to CTR-Profit, which uses the former as the metric, almostdifferentiate between queries, and hence does slightly better matches OT-Profit on profit-per-dollar, while yielding sig-than BidScaling. nificantly higher gains in CTR than OT-Profit. OT-CTR2- However, OT-Clicks may be getting the gains by drop- Profit further increases CTR gains, for a bit more drop inping high bid, high cpc clicks which might still yield more profit-per-dollar. In addition to validating our conjectureprofit for the advertiser than low bid, low cpc clicks. Figure 7 that blended metrics may yield better results, such blendedshows how the algorithms do on estimated profit-per-dollar: metrics let the search engine pick any arbitrary point in athe sum of the bids minus the total cost, divided by the to- curve that trades gains in user quality for gains in advertisertal cost, over all budget constrained campaigns. (From this value.3 An implementation of LP that used more resources could 6.3.4 Summarynarrow the gap by increasing the fraction of queries coveredby the “head”. However, as the number of distinct queries For optimizing clicks-per-dollar, OT-Clicks dramaticallyincreases rapidly for each fraction of additional coverage, outperformed LP-Clicks by using all the data. For op-every doubling of resources will only yield incremental gains. timizing profit-per-dollar, OT-Profit matched BidScal-
  9. 9. Figure 7: Impact on profit-per-dollar, over budget Figure 9: Multiple objectives: impact on Profit-per-constrained campaigns. The baseline is VPT. dollar. The baseline is VPT.Figure 8: Impact on CTR (including all campaigns). Figure 10: Multiple objectives: impact on CTR. TheThe baseline is VPT. baseline is VPT.ing on profit-per-dollar, while yielding better CTR (quality was significantly less than in the simulations, since we havefor users). If the primary goal was quality, OT-CTR blew complete knowledge of future queries in the simulations, un-away the other algorithms, while still improving advertiser like the partial predictability of live traffic.profit-per-dollar and clicks-per-dollar. Finally, blending mul- For OT-CTR, the experiments showed statistically signif-tiple metrics can yield better results than a single metric, icant improvements in quality for users, clicks, and conver-since different advertisers have more scope for optimization sions, while revenue was neutral – a Pareto improvement toalong different metrics. all objectives. Gains in conversions per dollar were signifi- One might wonder whether implementation of such tech- cantly higher than the gains in clicks per dollar, since CTRniques would incentivize budget unconstrained advertisers is correlated with conversion rate. Thus by shifting spendto lower their budgets and become budget constrained. The to ads with higher CTR, we also increased the number ofanswer is negative: BidScaling did slightly better than conversions.OT-Profit from an advertiser’s perspective (ignoring qual-ity for users), and BidScaling only used the information 7. CONCLUSIONavailable to the search engine, not the additional informa- We studied the problem of allocating budget constrainedtion available to the advertiser. With additional information spend in order to maximize objectives such as quality forabout conversion rates for each keyword, or the true value users, or ROI for advertisers. We introduced the concept ofof each click (rather than using the bid as the proxy for fair allocations (analogous to Nash equilibriums), and con-value), the advertiser easily get better ROI by optimizing strained the space of algorithms to those that yielded fairtheir campaign (versus becoming budget constrained). allocations. We were also constrained (in our setting) to not modify bids. We proposed a family of Optimized Throttling6.4 Live Traffic Experiments algorithms that work within these constraints, and can be We implemented our algorithms in Google’s production used to optimize different objectives. In fact, they can bead serving system, and ran experiments on live traffic, with tuned to pick an arbitrary point in the tradeoff curve be-both OT-CTR and BidScaling. The results were consis- tween multiple objectives.tent with our simulations, though the magnitude of the gains Prior approaches such linear programming and bid scaling
  10. 10. are not applicable in our setting: linear programming yields [11] D. X. Charles, M. Chickering, N. R. Devanur, K. Jain,unfair allocations, and bid scaling changes bids. It was nev- and M. Sanghi. Fast algorithms for finding matchingsertheless interesting to study how much of a penalty (if any) in lopsided bipartite graphs with applications toour algorithms pay for working within these constraints (fair display ads. In ACM Conference on Electronicallocations, fixed bids). We found that, surprisingly, our al- Commerce, pages 121–128, 2010.gorithms dramatically outperform linear programming – by [12] Y. Chen, P. Berkhin, B. Anderson, and N. Devanur.being fast enough to use all the data rather than being lim- Real-time bidding algorithms for performance-basedited to head queries. Our algorithms are also competitive display ad allocation. In KDD, pages 1307–1315.with bid scaling on advertiser metrics, while yielding better ACM, 2011.ad quality for users. [13] N. R. Devanur and T. P. Hayes. The adwords problem: The Optimized Throttling algorithms are designed for im- online keyword matching with budgeted bidders underplementation in a high throughput production system. The random permutations. In ACM Conference oncomputation overhead at serving time is negligible: just a Electronic Commerce, pages 71–78, 2009.few comparisons. The algorithms also have a minimal mem- [14] B. Edelman, M. Ostrovsky, and M. Schwarz. Internetory footprint, as little as 8 bytes (plus hash table overhead) Advertising and the Generalized Second-Priceper advertiser. Finally, they are robust with respect to errors Auction. American Economic Review, 97(1):242–259,in estimating future traffic, since they only need the total 2007.volume of traffic and the distribution of the chosen metric, [15] J. Feldman, M. Henzinger, N. Korula, V. S. Mirrokni,not the number of occurrences of each query. We validated and C. Stein. Online stochastic packing applied toour system design by implementing our algorithms in the display ad allocation. In ESA (1), pages 182–194,Google ads serving system, and running experiments on live 2010.traffic. The experiments showed significant improvements in [16] J. Feldman, N. Korula, V. S. Mirrokni,both advertiser ROI (conversions per dollar) and user expe- S. Muthukrishnan, and M. P´l. Online ad assignment arience. with free disposal. In WINE, pages 374–385, 2009.Acknowledgments: We thank Anshul Kothari for his con- [17] J. Feldman, S. Muthukrishnan, M. Pal, and C. Stein.tributions to the algorithms and system design. Budget optimization in search-based advertising auctions. In EC, 2007.8. REFERENCES [18] S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google File System. In 19th ACM Symposium on [1] Google automatic bidding product. Operating Systems Principles, 2003. http://adwords.google.com/support/aw/bin/ [19] G. Goel and A. Mehta. Online budgeted matching in answer.py?hl=en&answer=113234. random input models with applications to Adwords. [2] Google conversion optimizer product. http: In SODA, 2008. //www.google.com/adwords/conversionoptimizer/. [20] K. Hosanagar and V. Cherepanov. Optimal bidding in [3] Protocol buffers. Website, 2008. stochastic budget constrained slot auctions. In EC, http://code.google.com/p/protobuf. 2008. [4] Z. Abrams. Revenue maximization when bidders have [21] M. Mahdian, H. Nazerzadeh, and A. Saberi. budgets. In SODA, 2006. Allocating online advertisement space with unreliable [5] Z. Abrams, S. Keerthi, O. Mendelevitch, and estimates. In EC, 2007. J. Tomlin. Ad delivery with budgeted advertisers: a [22] A. Mehta, A. Saberi, U. V. Vazirani, and V. V. comprehensive lp approach. J. Electronic Commerce Vazirani. Adwords and generalized online matching. J. Research, 9(1), 2008. ACM, 54(5), 2007. [6] G. Aggarwal, A. Goel, and R. Motwani. Truthful [23] R. Pike, S. Dorward, R. Griesemer, and S. Quinlan. auctions for pricing search keywords. In EC, 2006. Interpreting the data: Parallel analysis with sawzall. [7] C. Borgs, J. Chayes, N. Immorlica, K. Jain, Scientific Programming Journal, 13:277–298, 2005. O. Etesami, and M. Mahdian. Dynamics of bid [24] P. Rusmevichientong and D. Williamson. An adaptive optimization in online advertisement auctions. In algorithm for selecting profitable keywords for Proc. of the 16th international conference on World search-based advertising services. In EC, 2006. Wide Web, pages 531–540. ACM, 2007. [25] H. Varian. Position auctions. International Journal of [8] C. Borgs, J. Chayes, N. Immorlica, M. Mahdian, and Industrial Organization, 25(6):1163–1178, 2007. A. Saberi. Multi-unit auctions with [26] E. Vee, S. Vassilvitskii, and J. Shanmugasundaram. budget-constrained bidders. In EC, pages 44–51, 2005. Optimal online assignment with forecasts. In ACM [9] N. Buchbinder, K. Jain, and J. Naor. Online Conference on Electronic Commerce, pages 109–118, Primal-Dual Algorithms for Maximizing Ad-Auctions 2010. Revenue. In ESA, 2007.[10] M. Cary, A. Das, B. Edelman, I. Giotis, K. Heimerl, A. Karlin, C. Mathieu, and M. Schwarz. Greedy bidding strategies for keyword auctions. In Proc. of the 8th ACM conference on Electronic commerce, pages 262–271. ACM New York, NY, USA, 2007.