Does Retail Advertising Work?                Measuring the Effects of Advertising on Sales                  via a Controll...
I. IntroductionThe retailing pioneer John Wanamaker (1838-1922) famously remarked, “Half themoney I spend on advertising i...
on sales are plagued by the possibility of reverse causality. Berndt (1991) reviews asubstantial econometric literature on...
influencing sales. Lodish et al. (1995b) investigated long-run effects, showing that forthose ads that did produce statist...
Of these matched users, we assigned 81% to a treatment group who subsequently viewedthree advertising campaigns on Yahoo! ...
The treatment of viewing advertisements was delivered randomly by the Yahoo! adserver such that even though 76.4% of the t...
than 5% of individuals in a given week make any transaction (see last column of Table4). For those who do make a purchase,...
individual browsing choices.) Ideally, we would remove these 36% of individuals bothfrom the treatment and control groups ...
individuals’ sales did. This will lead us to our preferred estimator below, a difference indifferences between treated and...
Next we exploit the panel nature of our data by using a Difference-in-Differences (DID)model. This allows us to estimate t...
The main identifying assumption of the DID model is that each individual’s idiosyncratictendency to purchase from the reta...
difference for treated individuals with the pre-post difference for untreated individuals(including both control-group mem...
compute a separate DID estimate for each week, beginning with the first week ofCampaign #1 and ending with the week follow...
for comparison. The weighted average is R$0.048 (R$0.014). We then multiply thisfigure by 13.43 to get a total effect acro...
level of sales, we use two different vertical scales, as labeled on the right and left of thefigure. We discover the surpr...
In this section, we dig deeper into several other dimensions of the data. Data glitches andthe relative magnitude of the f...
click do not buy online. In contrast, those who click show a large difference in purchaseamounts relative to untreated ind...
individual: for example, the ad server has some random component in deciding whetherto show this retailer’s ad or another ...
frequency as we might have expected with an unconditionally increasing marginal effectof an ad—age is more important for t...
This represents an increase of approximately 1.6% relative to the average probability of atransaction.Next, we consider th...
particular retailer records 86% of its sales volume offline, and we estimate 93% of ourtreatment effect to occur in offlin...
ReferencesAbraham, Magid. “The Off-Line Impact of Online Ads.” Harvard Business Review, April2008, p. 28.Abraham, Magid, a...
Lodish, Leonard M., Magid Abraham, Stuart Kalmenson, Jeanne Livelsberger, BethLubetkin, Bruce Richardson, and Mary Ellen S...
Tables and Figures              Table 1 – Summary Statistics for the Three Campaigns                  Figure 1 – Time Plot...
Table 2 – Basic Summary Statistics for Campaign #1                                          Control          Treatment% Fe...
Table 3 – Weekly Sales SummaryFigure 4 - Offline and Online Weekly Sales                   25
Table 4 - Three Week Treatment Effect Offline/Online Decomposition                                      Before           D...
Figure 6 - Difference between Treatment and Control Sales Histograms             Table 5 - Preliminary Two-Week Treatment ...
Figure 8 – Difference in Treated and Untreated Three-Week Sales Histograms       Table 6 – Difference-in-Differences Treat...
Table 7 – Weekly Summary of Effect on the Treated* For purposes of computing the treatment effect on the treated,we define...
Table 8 - Three Campaign Results Summary Figure 10 - Weekly DID Specification Test                    30
Figure 11 - Total Sales Histogram with Treatment Effect EstimatesFigure 12 - Plot and Regressions of Weekly Treatment Effe...
Table 9 - Offline/Online Treatment Effect Decomposition                                     Total Sales Offline Sales Onli...
Figure 14 - Nonparametric Estimate of the Average Campaign #1 Exposure by Age       Figure 15 - Nonparametric Estimate of ...
Figure 16 - Nonparametric Estimate of the Treatment Effect for Females Figure 17 - Nonparametric Estimate of the Treatment...
Table 10 - Decomposition of Treatment Effect into Basket Size and Frequency Effects                              3-Week DI...
Upcoming SlideShare
Loading in …5
×

Yahoo! Study: Does retail advertising work?

1,353 views

Published on

Does Retail Advertising Work?
Measuring the Effects of Advertising on Sales
via a Controlled Experiment on Yahoo!

Randall Lewis and David Reiley*

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,353
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Yahoo! Study: Does retail advertising work?

  1. 1. Does Retail Advertising Work? Measuring the Effects of Advertising on Sales via a Controlled Experiment on Yahoo! Randall Lewis and David Reiley* This Version: 28 December 2009 AbstractA randomized experiment performed in cooperation between Yahoo! and a major retailerallows us to measure the effects of online advertising on sales. We exploit a match ofover one million customers between the databases of Yahoo! and the retailer, assigningthem to treatment and control groups for an online advertising campaign for this retailerand then measuring each individual’s weekly sales at this retailer, both online and instores. By combining a controlled experiment with panel data on purchases, we findstatistically and economically significant impacts of the advertising on sales. Thetreatment effect persists for weeks after the end of an advertising campaign, and weestimate the total effect on revenues to be more than eleven times the retailer’sexpenditure on advertising during the study. Additional results explore differences in thenumber of advertising impressions delivered to each individual, age and genderdemographics, online and offline sales, and the effects of advertising on those who clickthe ads versus those who merely view them. Our results provide the best measurements todate of the effectiveness of image advertising on sales, and we shed light on importantquestions about online advertising in particular.* Lewis: Massachusetts Institute of Technology, randallL@mit.edu. Reiley: Yahoo! Research andUniversity of Arizona, reiley@eller.arizona.edu. We thank Meredith Gordon, Sergiy Matusevych, andespecially Taylor Schreiner for their work on the experiment and the data. Yahoo! Incorporated providedfinancial and data assistance, as well as guaranteeing academic independence prior to our analysis, so thatthe results could be published no matter how they turned out. We acknowledge the helpful comments ofManuela Angelucci, JP Dubé, Kei Hirano, John List, Preston McAfee, Paul Ruud, Michael Schwarz, Pai-Ling Yin, and seminar participants at University of Arizona, University of California at Davis, New YorkUniversity, Sonoma State University, Vassar College, the FTC Microeconomics Conference, and EconomicScience Association meetings in Pasadena, Lyon, and Tucson.
  2. 2. I. IntroductionThe retailing pioneer John Wanamaker (1838-1922) famously remarked, “Half themoney I spend on advertising is wasted; the trouble is I don’t know which half.”Measuring the impact of advertising on sales has remained a difficult problem for morethan a century. A particular problem has been obtaining data with exogenous variation inthe level of advertising. In this paper, we present the results of a field experiment thatsystematically exposes some individuals but not others to online advertising, andmeasures the impact on individual-level sales.With non-experimental data, one can easily draw mistaken conclusions about the impactof advertising on sales. To understand the state of the art among marketing practitioners,we consider a recent Harvard Business Review article (Abraham (2008)) written by thepresident of comScore, a key online-advertising information provider that logs theInternet browsing behavior of a panel of two million users worldwide. The article, whichreports large increases in sales due to online advertising, describes its methodology asfollows: “Measuring the online sales impact of an online ad or a paid-search campaign—in which a company pays to have its link appear at the top of a page of search results—isstraightforward: We determine who has viewed the ad, then compare online purchasesmade by those who have and those who have not seen it.”We caution that this straightforward technique may give spurious results. The populationof people who sees a particular ad may be very different from the population who doesnot see an ad. For example, those people who see an ad for eTrade on the page of Googlesearch results for the phrase “online brokerage” are a very different population fromthose who do not see that ad (because they did not search for that phrase). We mightreasonably assume that those who search for “online brokerage” are much more likely tosign up for an eTrade account than those who do not search for “online brokerage.” Thus,the observed difference in sales might not be a causal effect of ads at all, but insteadmight reflect a difference between these populations. In different econometric terms, theanalysis omits the variable of whether someone searched for “online brokerage” or not,and because this omitted variable is correlated with sales, we get a biased estimate.(Indeed, below we will demonstrate that in our particular application, if we had used onlynon-experimental cross-sectional variation in advertising exposure across individuals, wewould have obtained a very biased estimate of the effect of advertising on sales.) To pindown the causal effect, it would be preferable to conduct an experiment that holds thepopulation constant between the two conditions: a treatment group of people who searchfor “online brokerage” would see the eTrade ad, while a control group does not see thead.The relationship between sales and advertising is literally a textbook example of theendogeneity problem in econometrics, as discussed by Berndt (1991) in his applied-econometrics text. Theoretical work by authors such as Dorfman and Steiner (1954) andSchmalensee (1972) shows that we might expect advertisers to choose the optimal levelof advertising as a function of sales, so that regressions to determine advertising’s effects 1
  3. 3. on sales are plagued by the possibility of reverse causality. Berndt (1991) reviews asubstantial econometric literature on this topic.After a year of interactions with advertisers and advertising sales representatives atYahoo!, we have noticed a distinct lack of knowledge about the quantitative effects ofadvertising. This suggests that the economic theory of advertising has likely gotten aheadof practice, in the sense that advertisers (like Wanamaker) typically do not have enoughquantitative information to be able to choose optimal levels of advertising. They maywell choose advertising budgets as a fraction of sales (producing econometricendogeneity, as discussed in Berndt (1991)), but these are likely rules of thumb ratherthan informed, optimal decisions. Systematic experiments, which might measure thecausal effects of advertising, are quite rare in practice.In general, advertisers do not systematically vary their levels of advertising to measurethe effects on sales.1 Advertisers often change their levels of advertising over time, asthey run discrete “campaigns” during different calendar periods, but this variation doesnot produce clean data for measuring the effects of advertising because other variablesalso change concurrently over time. For example, if a retailer advertisers more duringDecember than in other months, we do not know how much of the increased sales toattribute to the advertising, and how much to increased holiday demand.As is well known in the natural sciences, experiments are a great way to establish andmeasure causal relationships. Randomizing a policy across treatment and control groupsallows us to vary advertising in a way that is uncorrelated with all other factors affectingsales, thus eliminating econometric problems of endogeneity and omitted-variable bias.This recognition has become increasingly important in economics and the social science;see Levitt and List (2008) for a summary. We add to this recent literature with anunusually large-scale field experiment involving over one million subjects.A few previous research papers have also attempted to quantify the effects of advertisingon sales through field experiments. Several studies have made use of IRI’s BehaviorScantechnology, a pioneering technique developed for advertisers to experiment withtelevision ads and measure the effects on sales. These studies developed panels ofhouseholds whose sales were tracked with scanner data, and arranged to split the cable-TV signal to give increased exposures of a given television ad to the treatment grouprelative to the control group. The typical experimental sample size was approximately3000 households. Abraham and Lodish report on 360 studies done for different brands,but many of the tests turned out to be statistically insignificant. Lodish et al. (1995a)report that only 49% of the 360 tests were significant at the 20% level, and then go on toperform a meta-analysis showing that much of the conventional wisdom amongadvertising executives did not help to explain which ads were relatively more effective in1 Notable exceptions include direct-mail advertising, and more recently, search-engine advertising, whereadvertisers do run frequent experiments (on advertising copy, targeting techniques, etc.) in order to measuredirect-response effects by consumers. In this study, we address brand advertising, where the expectedeffects have to do with longer-term consumer goodwill rather than direct responses. In this field,advertising’s effects are much less well understood. 2
  4. 4. influencing sales. Lodish et al. (1995b) investigated long-run effects, showing that forthose ads that did produce statistically significant results during a year-long experiment,there tended to be positive effects in the two following years as well. Hu, Lodish, andKrieger (2007) perform a follow-up study and find that similar tests conducted after 1995produce larger impacts on sales, though more than two thirds of the tests remainstatistically insignificant.More recently, Anderson and Simester (2008) experimented with a catalog retailer’sfrequency of catalog mailings, a direct-mail form of retail advertising. A sample of20,000 customers received either twelve or seventeen catalog mailings over an eight-month period. When customers received more mailings, they exhibited increased short-run purchases. However, they also found evidence of intertemporal substitution, with thefirm’s best customers making up for short-run increases in purchases with longer-rundecreases in purchases.Ackerberg (2001, 2003) makes use of non-experimental individual-level data on yogurtadvertising and purchases for 2000 households. By exploiting the panel nature of thedataset, he shows positive effects of advertising for a new product (Yoplait 150),particularly for consumers previously inexperienced with the product. For acomprehensive summary of theoretical and empirical literature on advertising, seeBagwell (2005).The remainder of this paper is organized as follows. We present the design of theexperiment in Section II, followed by a description of the data in Section III. In SectionIV, we measure the effect on sales during the first of three advertising campaigns in thisexperiment. In Section V, we demonstrate and measure the persistence of this effect afterthe campaign has ended. In Section VI, we examine how the treatment effect of onlineadvertising varies across a number of dimensions. This includes the effect on onlineversus offline sales, the effect on those who click ads versus those who merely viewthem, the effect for users who see a low versus high frequency of ads, the effect by ageand gender demographics, and the effect on number of customers purchasing versus thesize of the average purchase. The final section concludes.II. Experimental DesignThis experiment randomized whether individuals were exposed to a nationwide retailer’sdisplay-advertising campaign on Yahoo! We then measure the impact of the advertisingon individuals’ weekly purchases, both online and in stores. To achieve this end, wemade use of matches between both postal and email addresses in the retailer’s customerdatabase and the addresses in Yahoo!’s user database. This match yielded a sample of1,577,256 individuals.22 The retailer gave us a portion of their entire database, selecting those customers they were most interestedin experimenting on. We do not have precise information about their exact selection rule. 3
  5. 5. Of these matched users, we assigned 81% to a treatment group who subsequently viewedthree advertising campaigns on Yahoo! from the retailer. The remaining 19% wereassigned to the control group and saw none of the retailer’s ads on Yahoo! The simplerandomization was designed to make the treatment-control assignment independent of allother relevant variables.The treatment group of 1.3 million Yahoo! users was exposed to three differentadvertising campaigns over the course of four months, separated by approximately one-month intervals. Table 1 gives summary statistics for the three campaigns, whichdelivered 32 million, 10 million, and 17 million impressions, respectively. Figure 1shows that by the end of the third campaign, a total of 924,000 users had been exposed toads. These individuals viewed an average of 64 ad impressions per person.These represent the only ads shown by this retailer on Yahoo! during this time period.However, Yahoo! ads represent a small fraction of the retailers overall advertisingbudget, which included other media such as newspaper and direct mail. Yahoo!advertising turns out to explain a very small fraction of the variance in weekly sales, butbecause of the randomization, they are uncorrelated with any other influences onshopping behavior.The campaigns in this experiment consisted of "run-of-network" ads on Yahoo! Thismeans that ads appeared on various Yahoo! properties, such as mail.yahoo.com,groups.yahoo.com, and maps.yahoo.com. Figure 2 shows a typical display advertisementplaced on Yahoo! The large rectangular ad for Netflix3 is similar in size and shape to theadvertisements in this experiment.Following the experiment, Yahoo! and the retailer sent data to a third party, who matchedthe retail sales data to the Yahoo! browsing data. The third party then anonymized thedata to protect the privacy of customers. In addition, the retailer disguised actual salesamounts by multiplying by an undisclosed number between 0.1 and 10. All financialfigures involving treatment effects and sales will be reported in R$, or “Retail Dollars,”rather than US dollars.III. Sales and Advertising DataTable 2 provides some summary statistics that indicate a successful randomization. Thetreatment group was 59.7% female while the control group was 59.5% female, astatistically insignificant difference (p=0.212). During campaign #1, the proportion ofindividuals who did any browsing on the Yahoo! network was 76.4% in each group(p=0.537). The mean number of Yahoo! page views was 363 pages for the treatmentgroup versus 358 in the control group, another statistically insignificant difference(p=0.627).3 Netflix was not the retailer featured in this campaign but is an example of a firm which only does salesonline and advertises on Yahoo! 4
  6. 6. The treatment of viewing advertisements was delivered randomly by the Yahoo! adserver such that even though 76.4% of the treatment group visited a Yahoo! website, only63.7% of the treatment group was actually shown the retailer’s ads. On average, a visitorwas shown the ads on only 7.0% of the pages they visited during Campaign #1, but theprobability of being shown on any particular page depended on several factors including,but not limited to, the property they visited, specific content on the page, and userdemographics.The number of the ads viewed by each Yahoo! user in this campaign is quite skewed. Thematch between the Yahoo! data and the retailer’s sales data should tend to reduce thenumber of non-human “bots” or automated browsing programs since a would-be “bot”would have to make a purchase at the retailer in order to be included in our sample.However, there still remains a very small percentage of users who have extreme browsingbehavior. Figure 3 shows a frequency histogram of the number of the retailer’s adsviewed by treatment group members that saw at least one of the ads during Campaign #1.The majority of users saw fewer than 100 ads, with a mere 1.0% viewing more than 500of the ads during the two weeks of the online ad campaign. The maximum number of theads viewed during the campaign period by one individual was 6050.4One standard statistic in online advertising is the click-through rate, or fraction of ads thatwere clicked by a user. The click-through rate for this campaign was 0.28%. Withdetailed user data, we can also tell that the proportion of the designated treatment groupwho clicked at least one ad in this campaign was 4.6%. Of those who actually saw at leastone ad, the fraction who clicked at least one ad was 7.2%.In order to protect the privacy of individual users, a third party matched the retailer’ssales data to the Yahoo! browsing data and anonymized all observations so that neitherparty could identify individual users in the matched dataset. This weekly sales dataincludes both online and offline sales and spans approximately 18 weeks: 3 weekspreceding, 2 weeks during, and the week following each of the three campaigns. Salesamounts include all purchases that the retailer could link to each individual customer inthe database, primarily by use of credit-card information.5Table 3 provides a weekly summary of the sales data, while Figure 4 decomposes thesales data into online and offline components. We see that offline (in-store) salesrepresent 86% of the total. Combined weekly sales are quite volatile, even thoughaggregated across 1.6 million individuals, ranging from less than R$0.60 to more thanR$1.60 per person. The standard deviation of sales across individuals is much larger thanthe mean, at approximately R$14. The mean includes a large mass of zeroes, as fewer4 Although the data suggests extreme numbers of ads, Yahoo! engages in extensive “anti-click-fraud”efforts to ensure fair pricing of its products and services. In particular, not all ad impressions in the datasetwere deemed valid impressions and charged to the retailer.5 To the extent that these customers make purchases (such as with cash) that cannot be tracked by theretailer, our estimate may underestimate the total effect of advertising on sales. However, the retailerbelieves that they track at least 90% of purchases for these customers. 5
  7. 7. than 5% of individuals in a given week make any transaction (see last column of Table4). For those who do make a purchase, the transaction amounts exhibit large positive andnegative amounts (the latter representing returns), but well over 90% of purchaseamounts lie between –R$100 and +R$200.We next look at the results of the experiment for the first of the three advertisingcampaigns. Throughout the paper we primarily focus on campaign #1 for several reasons.The first is that more than 60% of the ads during all three campaigns were shown duringthose two weeks. Second, subsequent campaigns were shown to the same treatment andcontrol groups, preventing us from examining any before and after differences. Third,even with 1.6 million customers, the expected magnitudes of any treatment effect thatdepends on frequency are likely too small for us to have the statistical power to estimategiven the high volatility of the data. With these issues taken into consideration, weexamine the effects of campaign #1.IV. Basic Treatment Effect in Campaign #1For campaign #1 we are primarily interested in estimating the effect of the treatment onthe treated individuals. In traditional media such as TV commercials, billboards, andnewspaper ads, the advertiser must pay for the advertising space, regardless of thenumber of people that actually see the ad. However, with online display advertising, it isvery easy to track potential customers and bill an advertiser by the number of impressionsan ad is delivered. Although there is a significant difference between a delivered ad and aseen ad, the ability to count the number of attempted exposures to an individual allows usto investigate the treatment effect in many ways such as by focusing on those who weredelivered at least one ad impression.Table 5 gives initial results comparing sales between treatment and control groups. Welook at total sales (online and offline) during the two weeks of the campaign, as well astotal sales during the two weeks prior to the start of the campaign. During the campaign,we see that the treatment group purchased R$1.89 per person, compared to the controlgroup at $1.84 per person. This suggests a positive treatment effects of ads ofapproximately R$0.05 per person, but the effect is not statistically significant atconventional levels (p=0.162).For the two weeks before the campaign, the control group purchased slightly (andstatistically insignificantly) more than the treatment group: R$1.95 versus R$1.93. Wecan combine the pre- and post-campaign data to obtain a difference-in-differenceestimate of the increase in sales for the treatment group relative to the control. Thistechnique gives a slightly larger estimate of R$0.06 per person, but is again statisticallyinsignificant at conventional levels (p=0.227).Because only 64% of the treatment group was actually treated with ads, this simpletreatment-control comparison has been diluted with the 36% of individuals who did notsee any ads during this campaign. (Recall that they did not see ads because of their 6
  8. 8. individual browsing choices.) Ideally, we would remove these 36% of individuals bothfrom the treatment and control groups in order to get an estimate of the advertisingtreatment on those who could be treated. Unfortunately, we are unable to determinewhich control-group members would have seen ads for this campaign had they been inthe treatment group.6 Instead of removing these individuals, we scale up our dilutedtreatment effect (R$0.05) by dividing by 0.64, the fraction of individuals treated.7 Thisgives us an estimate of the treatment effect on those treated with ads: R$0.083. Thestandard error is also scaled proportionally, leaving the level of statistical significanceunaffected (p=0.162).The last two rows of the table show us an interesting difference between those treatment-group members who saw ads in this campaign and those who didn’t. Before thecampaign, those treatment group members who would eventually see online adspurchased considerably less (R$1.81) than those who would see no ads (R$2.15). Thisstatistically significant difference (p<0.01) is evidence of heterogeneity in shoppingbehavior that happens to be correlated with ad views (through Yahoo! browsingbehavior). That is, for this population of users, those who browse Yahoo! more activelyalso have a tendency to purchase less at the retailer, independent of the number of adsshown. Therefore, it would be a mistake to exclude from the study those treatment-groupmembers who saw no online ads, because the remaining treatment-group members wouldnot represent the same population as the control group. Such an analysis would result inselection bias towards finding a negative effect of ads on sales, because the selectedtreatment-group members purchase an average of R$1.81 in the absence of anyadvertising treatment, while the control-group members purchase an average of R$1.95 –a statistically significant difference of R$0.13 (p=0.002). The pre-campaign data arecrucial in allowing us to demonstrate the magnitude of this possible selection bias.During the campaign, there persists a sales difference between treated and untreatedmembers of the treatment group, but this difference becomes smaller. While untreatedindividuals’ sales drop by R$0.10 from before the campaign, treated individuals’ salesremained constant. (Control-group mean sales also fell by R$0.10 during the sameperiod, just like the untreated portion of the treatment group.) This suggests thatadvertisements may be preventing treated individuals’ sales from falling like untreated6 The Yahoo! ad server uses a complicated set of rules and constraints to determine which ad will be seenby a given individual on a given page. For example, a given ad might be shown more often on Yahoo! Mailthan on Yahoo! Finance. If another advertiser has targeted females under 30 during the same time period,then this ad campaign may have been relatively more likely to be seen by other demographic groups. Ourtreatment-control assignment represented an additional constraint, and we were unfortunately unable toobserve exactly which control-group members might have seen ads for this campaign.7 This is equivalent to estimating the local average treatment effect (LATE) via instrumental variables viathe following model:where the first stage regression is an indicator for whether the number of the retailer’s ads seen is greaterthan zero on the exogenous treatment-control randomization. 7
  9. 9. individuals’ sales did. This will lead us to our preferred estimator below, a difference indifferences between treated and untreated individuals (where “untreated” includes bothcontrol-group members and untreated members of the designated treatment group).Next we look at the shape of the distribution of sales. Figure 5 compares histograms ofsales amounts for the treatment group and control group, omitting those individuals forwhom there was no transaction. For readability, these histograms exclude the mostextreme outliers, trimming approximately 0.5% of the positive purchases from both theleft and the right of the graph.8 Relative to the control, the treatment density has less massin the negative part of the distribution (net returns) and more mass in the positive part ofthe distribution. These small but noticeable differences both point in the direction of apositive treatment effect, especially when we recall that this diagram is diluted by the34% of customers who did not browse enough to see any ads on Yahoo! Figure 6 plotsthe difference between the two histograms in Figure 5. The treatment effect is the averageover this difference between treatment and control sales distributions.Above, we noted that 36% of the treatment group did not see ads, and we are unable toidentify the corresponding 36% of the control group who would not have seen ads, inorder to remove both groups from the analysis. However, we were able to obtain the totalnumber of pages browsed on the Yahoo! network during the campaign for eachindividual in both the treatment and control groups.9 We know that someone who viewedno pages would not have viewed ads for this retailer, though the converse is not also true.We find that 76.4% of both treatment and control groups had nonzero page views on theYahoo! network during the campaign.Table 5 shows our results for those individuals observed to have page views during thecampaign. Excluding the 23.6% of individuals who did not browse the Yahoo! network,we obtain a statistically significant treatment-control difference of R$0.078. Even thisresult is somewhat diluted by individuals who did not actually view ads. We know that63.7%/76.4% = 83.4% of those who saw pages actually saw this retailer’s ads, so weneed to scale up the treatment effect again by dividing by 0.834. This yields an averageeffect on the treated of R$0.093, which is marginally statistically significant at the samelevel (p=0.09).8 We trim about 400 observations from the left and 400 observations from the right, of a total of 75,000observations with positive purchase amounts. These outliers do not seem to be much different betweentreatment and control. We leave all outliers in our analysis, despite the fact that they increase the varianceof our estimates. Because all data were recorded electronically, we have no reason to suspect coding errors.9 The original dataset did not contain data on individuals’ page views, so including this variable required adata merge. Some observations were not uniquely matched using available matching variables. All pageview values for these observations were attached to an observation that had matching values for thematching variables, but may not have been precisely the same observation. To handle data analysiscorrectly on this imperfectly merged data, we grouped together all imperfectly matched observations were(inefficiently) grouped together and all independent variables were averaged in order to eliminate thismeasurement error. We note that when two observations are mismatched, if we average their independentvariables, we eliminate the measurement error of that mismatch since their measurement error is perfectlynegatively correlated. By doing this, we preserve the unbiasedness of our least-squares regressions. Formore details, see Lewis (2008). 8
  10. 10. Next we exploit the panel nature of our data by using a Difference-in-Differences (DID)model. This allows us to estimate the effects of advertising on sales while controlling forthe heterogeneity we have observed across individuals in their purchasing behavior. OurDID model makes use of the fact that we observe the same individuals both before andafter the start of the ad campaign. We begin with the following equation: .In this equation, Salesi,t is the sales for individual i in time period t, SawAdsi,t is thedummy variable indicating whether individual i saw any of the retailer’s ads in timeperiod t, γt is the average effect of viewing the ads, βt is a time-specific mean, αi is anindividual effect or unobserved heterogeneity (which we know happens to be correlatedwith viewing ads), and εi,t is an idiosyncratic disturbance. Computing time-seriesdifferences will enable us to eliminate the individual unobserved heterogeneity αi.We consider two time periods: (1) the “pre” period of two weeks before the start ofcampaign #1, and (2) the “post” period of two weeks after the start of the campaign. Bycomputing first differences of the above model across time, we obtain:Since no one saw ads in the “pre” period, we know that SawAdsi,pre = 0. So the differenceequation simplifies to:We can then estimate this difference equation via ordinary least squares (OLS). The betacoefficient is directly comparable to the previous “rescaled” estimates, as it measures theeffect of the treatment on the treated. Note that in this specification, unlike the previousspecifications, we pool together everyone who saw no ads in the campaign, includingboth the control group and those treatment-group members who turned out not to see anyads.Using difference in differences, the estimated average treatment effect of being treated byviewing at least one of the retailer’s ads during the campaign is R$0.102 with a standarderror of R$0.043. This effect is statistically significant (p<0.01) as well as economicallysignificant, representing an average increase of 5% on treated individuals’ sales. Basedon the 814,052 treated individuals, the estimate implies an increase in revenues for theretailer of R$83,000 ± 68,000 (95% confidence interval) due to the campaign. Since thecost of campaign #1 was approximately R$20,00010, the point estimate suggests that theads produced four times as much revenue as they cost the retailer. From this follows ourconclusion that “retail advertising works!”10 These advertisements were more expensive than a regular run-of-network campaign. The database matchwas a form of targeting that commanded a large premium. In our cost estimates, we report the dollaramounts (scaled by the retailer’s “exchange rate”) actually paid by the retailer to Yahoo! 9
  11. 11. The main identifying assumption of the DID model is that each individual’s idiosyncratictendency to purchase from the retailer is constant across time, and thus the treatmentvariable is uncorrelated with the DID error term. This assumption could be violated ifsome external event at some point during the experiment had different effects on theretail purchases of those who did and did not see ads. For example, perhaps in the middleof the time period studied, the retailer did a direct-mail campaign we don’t know about,and the direct mail was more likely to reach those individuals in our study who browsedless often on Yahoo!. Fortunately, our previous experimental estimates are very similar inmagnitude to the DID estimates: R$0.083 for the simple comparison of levels betweentreatment and control, R$0.093 for the same estimate restricted to those individuals withpositive Yahoo! page views, and R$0.102 for the DID estimate.The similarity between these three different estimates reassures us about the validity ofour DID specification. We note that there are two distinctions between our preferred DIDestimate and our original treatment-control estimate. First, DID looks at pre-postdifferences for each individual. Second, DID compares between treated and untreatedindividuals (pooling part of the treatment group with the control group), rather thansimply comparing between treatment and control groups. We perform a formalspecification test of this latter difference by comparing pre-post sales differences in thecontrol group versus the untreated portion of the treatment group. The untreated portionof the treatment group has a mean just R$0.001 less than the mean of the control group,and we cannot reject the hypothesis that these two groups are the same (p=0.988).V. Persistence of the EffectsOur next question concerns the longer-term effects of the advertising after the campaignhas ended. One possible case is that the effects could be persistent and increase sales evenafter the campaign is over. Another case is that the effects are short-lived, only during theperiod of the campaign. A third possibility is that advertising could have negative long-run effects if it causes intertemporal substitution by shoppers: purchasing todaysomething that they would otherwise have purchased a few weeks later. In this section,we distinguish empirically between these three competing hypotheses.A. Sales in the Week After the Campaign EndedWe begin by focusing on the six weeks of data which we received from the retailertailored to the purposes of analyzing campaign #1 which, as previously mentioned,include three weeks of data prior to campaign #1 and three weeks following its start. Toperform the test of the above hypotheses, we use the same Difference-in-Differencesmodel as before, but this time include in the “post” period the third week of sales resultsfollowing after the two-week campaign. For symmetry, we also use all three weeks ofsales in the “pre” period, in contrast to the results in the previous section, which werebased on two weeks both pre and post. As before, the DID model compares the pre-post 10
  12. 12. difference for treated individuals with the pre-post difference for untreated individuals(including both control-group members and untreated treatment-group members).Before presenting our estimate, we first show histograms in Figure 7 of the distributionsof three-week pre-post sales differences. Note three differences between Figure 7 and thehistogram presented earlier in Figure 5: (1) we compare pre-post differences rather thanlevels of sales, (2) we compare treated versus untreated individuals rather than treatmentversus control groups, and (3) we look at three weeks of sales data (both pre and post)rather than just two. The difference between the treated and untreated histograms can befound in Figure 1, with 95% confidence intervals for each bin indicated by the whiskerson each histogram bar. We see that the treated group has substantially more weight inpositive sales differences, and substantially less weight in negative sales differences. Thissuggests a positive treatment effect, which we now measure via difference in differences.The three-week DID results can be found in the final column of Table 6, with the two-week results included for comparison. Using our preferred DID estimator, we find thatthe estimated treatment effect increases from R$0.102 for two weeks to R$0.166 for threeweeks.Thus, the treatment effect for the third week appears to be just as large as the averageeffect per week during the two weeks of the campaign itself. To pin down the effects inthe third week alone, we run a DID specification comparing the third week’s sales withthe average of the three pre-campaign weeks’ sales. This gives us an estimate of R$0.061with a standard error of R$0.024 (p=0.01), indicating that the effect in the third week isboth statistically and economically significant.B. More than One Week after the Campaign EndedCould the effects be persistent even more than a week after the campaign ends? Weinvestigate this question using sales data collected for Campaigns #2 and #3. Recall thatfor each campaign, we obtained three weeks of sales data before the start of thecampaign, and three weeks of sales data after the start of the campaign. It turns out, forexample, that the earliest week of pre-campaign sales for Campaign #2 happens to be thefourth week after the start of Campaign #1, so we can use that data to examine thetreatment effect of Campaign #1 in its fourth week.11In order to check for extended persistence of advertising, we use the same DID model asbefore, estimated on weekly sales. Our “pre-period” sales figure will be the weeklyaverage of sales in the three weeks preceding the start of Campaign #1. Our “post-period”sales figure will be the sales during a given week after the start of Campaign #1. We then11 Because the campaigns did not start and end on the same day of the week, we end up with a three-dayoverlap between the “third week after the start of the campaign” and the “fourth week after the start of thecampaign.” That is, those three days of sales are counted twice. We correct for this double-counting in ourfinal estimates of the total effect of advertising on sales. 11
  13. 13. compute a separate DID estimate for each week, beginning with the first week ofCampaign #1 and ending with the week following Campaign #3.12,13,14The results can be found in Table 7, represented graphically in Figure 9. In the figure,vertical lines indicate the beginning (solid) and end (dashed) of each of the threecampaigns. The estimated treatment effects in later weeks thus include cumulative effectsof all campaigns run to date. The average weekly treatment effect on the treated isR$0.045, with individual weekly estimates ranging from R$0.004 to R$0.080. Some ofthe individual weekly treatment effects are statistically indistinguishable from zero (95%confidence intervals graphed in Figure 9), but, strikingly, every single one of the pointestimates is positive. We particularly note the large, positive effects estimated during theinter-campaign periods, as many as four weeks after ads stopped showing for this retaileron Yahoo!To obtain an estimate of the cumulative effect of all three campaigns, we use all fifteenweeks of data. We present the results of two different methods in Table 8. The firstmethod involves an estimate of the average weekly treatment effect, comparing theaverage of the fifteen weeks after Campaign #1 to the average of the three weeks prior tocampaign #1. The estimate is R$0.045. We then multiply this estimate by the number ofindependent weeks of sales data in the sample period, which is actually thirteen weeksand three days. This estimate gives us an average treatment effect of the ads (on thosewho saw at least one ad during one of the three campaigns) of R$0.532 with a standarderror of R$0.196.A more econometrically efficient method is to compute an average of the 15 weeklyestimates of the treatment effect, taking care to report standard errors that account for thecovariances between regression coefficients across weeks. The table reports an optimallyweighted average of the 15 per-week treatment effects,15 with a simple average included12 There is an 11-day period of unobserved sales between Campaign #2 and Campaign #3.13 Because two of the campaigns lasted ten days rather than an even number of weeks, in two cases thesecond “week” of a campaign consists of only three days instead of seven. In the cases of 3-day “weeks,”we scale up the sales data by 7/3 to keep consistent units of sales per week.14 Unfortunately, data for Campaign #3 came in a separate file from the third-party matching service,without unique identifiers, so we could not link it directly to the data for Campaigns #1 and #2. Weimperfectly matched the Campaign #3 data back in, using a match on observables that were common acrossdata sets. In the merge 80% of observations are uniquely matched and the remaining 20% are matched intogroups. The combined number of groups and uniquely matched observations is 1,385,255. See footnote 9and Lewis (2008) for details on the GLS regression we used to weight the group matches appropriately.15 We implement the weighted average by computing a standard GLS regression on a constant, where theGLS weighting matrix is the covariance matrix among the fifteen regression coefficients. Thesecovariances can be computed aswhere the betas are from least squares regression coefficients from regressing Yj on Xj and Yk on Xk. Weestimate Cov(ε1,ε2) from the residuals of each regression. One could use the simple estimator:but we instead use the heteroskedasticity Eicher-White formulation: 12
  14. 14. for comparison. The weighted average is R$0.048 (R$0.014). We then multiply thisfigure by 13.43 to get a total effect across the entire time period of observation, since the“fifteen-week” time period actually includes a total of only thirteen weeks and six days.This multiplication gives us a figure of R$0.645 (R$0.183). This is slightly larger thanthe total effect reported in the previous paragraph, because it assumes that all users weretreated from the beginning of campaign #1. To estimate the total benefit of the three campaigns, we take our estimate of R$0.645 andmultiply it by the average number of users who had already been treated with ads in agiven week, which turns out to be 864,000 (see the graph in Figure 1). This gives us a95% confidence interval estimate of the total incremental revenues due to ads ofR$560,000 ± 310,000. For comparison, the total cost of these advertisements to theadvertiser was R$51,000. Thus, our point estimate says that the total revenue benefit ofthe ads was more than eleven times the cost of the campaign, and even the lower boundof our confidence interval indicates a benefit of more than six times the cost. Even morestrongly than before, we conclude that retail advertising works!Similar to the specification test computed for the DID estimate during the first 3 weeks,we consider a similar specification test to determine whether the control group and theuntreated treatment group individuals pursue different time-varying purchasing behavior.We present the results of the weekly estimates of this difference in Figure 10. During thefirst 9 weeks, there is no sizeable aberration to suggest rejection of the null. In fact, theaverage difference is that the control group individuals on average purchase less. Sincethe control group is composed of individual who would and would not have seen ads, forthe subgroup that saw ads, we would infer that their change in purchase behavior wouldhave been even less than the untreated treatment group individuals. This suggests thatDID may slightly underestimate the treatment effect during those weeks. However,during the last 6 weeks, there seems to be some evidence that the DID estimates actuallyoverestimate the treatment effect. While this may be problematic for performinginference on the weekly treatment effects, when taken in aggregate over the entire 15week span, the test fails to reject that the two groups are different on average. Still, theeffect on the point estimate of the cumulative treatment effect is approximately R$0.06,about 10% of the aggregate treatment effect. Thus, the results of the specification test callinto question long run extensions of DID when the pre-period is short and distant fromthe treatment period as time-varying differences are potentially amplified over time.C. Countercyclical Treatment Effects?Before concluding this section, we make one more empirical observation. In Figure 11,we see how the baseline level of sales (measured using total purchases by the controlgroup) and the treatment effect of advertising on sales both vary considerably from weekto week. Because the treatment effect is an order of magnitude smaller than the baseline 13
  15. 15. level of sales, we use two different vertical scales, as labeled on the right and left of thefigure. We discover the surprising result that the treatment effect appears to varyinversely with the level of sales. Figure 12 illustrates this negative relationship with ascatter plot; the best-fit regression line has an R2 of 0.6.With higher the level of sales, the lower the absolute magnitude of the treatment effectbecomes. As Figure 12 shows, when per-person sales increase (within the sample) fromR$0.75 to R$1.75, the treatment effect falls from just over R$0.05 to nearly zero.16 Inother words, the effects of retail advertising look countercyclical: having advertised tendsto decrease the variance of weekly sales. This result is rather surprising: for example, ifthe treatment effect had been a multiplicative percentage of sales, we would haveexpected the opposite result. Thus, advertising appears to help the retailer boost salesduring slower times of the year, smoothing sales.We do not wish to overstate the representativeness of this result. We note that it comesfrom just fifteen observations of weekly sales data for a single retailer advertising on oneparticular website. As such, we consider the result to be merely suggestive rather thandefinitive. However, we do find the result quite interesting and are unaware of anyprevious results on this topic. It would be very valuable to see whether this result can bereplicated in future advertising experiments, because it has potentially importantimplications for optimal advertising policies by retailers.D. Summary of Persistence ResultsTo summarize the main result of this section, we find that the retail image advertising inthis experiment led to persistent positive effects on sales for a number of weeks after theads stopped showing. When we take these effects into account, we find a huge return toadvertising for the period of our sample. It is possible, however, that we are stillunderestimating the returns to advertising, because we are missing sales data two weeks(near Christmas) between campaigns #2 and #3, and because our sales data end one weekafter the end of campaign #3, when our previous results indicate that that advertising maywell have persistent effects beyond the end of our sample period. We hope to investigatethis persistence in future experiments with longer panels of sales data.VI. Detailed Results for Campaign #116 At first, we worried that this relationship might be a mechanical artifact of the DID estimator, whichsubtracts control-group sales (as well as the sales of other untreated individuals) from treated individuals’sales, so comparing this difference to the level of control-group sales might be forcing a negativerelationship. However, we got a very similar negative slope when we used treatment-group sales as thebaseline sales amount, where one might expect mechanical artifacts to force a positive relationship. Havingtried both groups to measure baseline sales, we feel sanguine about reporting the correlation using control-group sales. 14
  16. 16. In this section, we dig deeper into several other dimensions of the data. Data glitches andthe relative magnitude of the first campaign relative to the following two make itconvenient to examine the effects of the first campaign only. To the extent that anypersistence results can be generalized to a longer period of time for the total treatmenteffect, we suspect similar results may apply to any persistence investigation for theresults of this section. We leave the investigation of this interaction between thesedetailed campaign #1 results and persistence to future research.First, we decompose the effects of online advertising into offline versus online sales,showing that more than 90% of the impact is offline. We also demonstrate that most ofthe substantial impact on in-store sales occurs for users who merely view the ads butnever click them. Second, we examine how the treatment effect varies with the number ofads viewed by each user. Third, we decompose the effects of online advertising into theprobability of a transaction versus the size of the purchase conditional on a transaction.We perform all of these analyses only on Campaign #1. Because the treatment andcontrol groups were not re-randomized between campaigns and because of thepersistence results in the previous section, we know that the second and third campaignscannot be analyzed cleanly on their own. For the results in this section, we continue touse our preferred specification of a difference in differences for three weeks before andafter the start of Campaign #1, comparing treated versus untreated individuals.A. Offline versus Online Sales and Views versus ClicksIn Table 9, we present a decomposition of the treatment effect into offline and onlinecomponents by running the previous difference-in-difference analysis separately foroffline and online sales. The first line of the table shows that the vast majority oftreatment effect comes from brick-and-mortar sales. The treatment effect of R$0.166 pertreated individual turns out to consist of a R$0.155 effect on offline sales plus a R$0.011effect on online sales. In other words, 93% of the treatment effect of the online adsoccurred in offline sales.In online advertising, the click-through rate (CTR) is a standard measure of performance.This measure (approximately 0.3% for the ads in this experiment) provides moreinformation than is available in traditional media, but it still does not measure thevariable that advertisers actually care most about: the differential impact on sales. Aninteresting question is, therefore, “To what extent do clicks on ads predict retail sales?”We answer this question in the second and third lines in Table 9. We partition the set oftreated individuals into those who clicked on an ad (line 2) versus those who merelyviewed ads but did not click any of them (line 3). Of the 814,000 individuals treated withads, 7.2% clicked on at least one ad, while 92.8% merely viewed them. The results arequalitatively different for offline versus online sales. For offline sales, those individualswho view but do not click ads purchase R$0.150 more than untreated individuals (astatistically significant difference). For online sales, the effect of viewing but not clickingis precisely measured to be very close to zero, so we can conclude that those who do not 15
  17. 17. click do not buy online. In contrast, those who click show a large difference in purchaseamounts relative to untreated individuals in both offline and online sales: R$0.215 andR$0.292, respectively. While this click effect is highly statistically significant for onlinesales, it is insignificant for online sales due to a large standard error.With respect to total sales, we see a treatment effect of R$0.139 on those who merelyview ads, and a treatment effect of R$0.508 on those who click them. Our originalestimate of the treatment effect can be decomposed into the effect of views versus clicksas follows, using their relative weights in the population: R$0.166 = (92.8%)(R$0.139) +(7.2%)(R$0.508). The first component – the effect on those who merely view but do notclick ads – represents 78% of the total treatment effect. Thus clicks, though the standardperformance measure in online advertising, fail to capture the vast majority of the effectson sales.B. How the Treatment Effect Varies with the Number of AdsWe saw in Figure 3 that different individuals viewed very different numbers of adsduring Campaign #1. We now ask how the treatment effect varies with the number of adsviewed.We wish to produce a smooth curve showing how this difference varies with the numberof ads. Recall that for each individual, we observe the pre-post difference in purchaseamounts (three weeks before versus three weeks after the start of the campaign). Weperform a nonparametric, locally linear regression on this difference, using anEpanechnikov kernel with a bandwidth of 15 ad views. For readability, because the pre-post differences are negative on average, we redefine the vertical intercept of the graph sothat it equals zero for those with zero ad views.Figure 13 gives the results, together with 95% confidence-interval bands around theconditional mean. We see that the treatment effect is initially increasing in the number ofads viewed. The effect peaks at approximately 50 ads viewed, for a maximum treatmenteffect of R$0.25, and remains almost flat at this level until it reaches 100 ad impressionsper person. Beyond this point, the data has become so sparse (only 6.1% of the treatmentgroup see more than 100 ad views) that the effect is no longer statistically distinguishablefrom zero.We caution that we may not want to interpret this graph as showing the additional salesthat would be generated from each marginal increase in ads shown to a given individual.This causal interpretation could be invalid because the number of ad views does not varyexogenously by individual. Each individual has a browsing “type” that determines thedistribution of pages they will visit on Yahoo!, and this determines the average number ofads that user will receive. We know from the previous results in Table 4 that browsingbehavior is correlated with the retail purchases in the absence of advertising on Yahoo!,so we shy away from the strongly causal interpretation we might like to make. On theother hand, we know there is some true exogenous variation in ad views for a given 16
  18. 18. individual: for example, the ad server has some random component in deciding whetherto show this retailer’s ad or another advertiser’s ad on a given page, and this type ofvariation in ads would be uncorrelated with a user’s type. To the extent that variation inthe number of ads represents this kind of within-individual variation rather than variationbetween different types of users, then a valid causal interpretation would be possible,particularly locally.The upward-sloping line on the graph represents our estimate of the cost to the retailer ofpurchasing a given number of ad impressions per person. This line has a slope ofapproximately R$0.001, which is the price per ad impression to the retailer. Thus, thegraph plots the nonlinear revenue curve versus the linear cost curve for a given number ofadvertisements delivered to a given individual. The crossover that occurs atapproximately 100 ad views is a breakeven point for revenue.For those individuals who viewed fewer than 100 ads (93.9% of the treatment group), theincreased sales exceed the cost of the advertisements. If we want to look at incrementalprofits rather than incremental revenues, we could assume a 50% retail profit margin andmultiply the entire benefit curve by 50%, effectively reducing its vertical height by half.Given the shape of the curve, the breakeven point remains approximately the same, ataround 100 ads per person. For the 6% of individuals who received more than 100 ads,the campaign might not have been cost-effective, though statistical uncertainty preventsthis from being a firm conclusion. The retailer might be able to gain from a policy thatcaps the number of ad views per person at 100, because it avoids spending money onindividuals for whom there does not appear to be much positive benefit. This hypothesiscould fruitfully be investigated in future experiments.C. How the Treatment Effect Varies with Age and GenderHaving seen the significant effect of ad frequency on the treated, we investigate who isactually seeing those ads. Figure 14 shows a nonparametric regression of the averagenumber of ads seen by age group. The number of ads seen is actually highest for theyounger age groups between the ages of 25 and 35 with the mean hovering around 45-50ads per person treated and then declining to an average of approximately 33 for the moreelderly. If we believe in an unconditionally increasing marginal effect of an ad, we wouldexpect the treatment effect to be higher for younger age groups than for older groups.We interact the treatment effect with age and perform a nonparametric regression of firstdifference sales on age interacted with the treated dummy in Figure 15. As before, we usethe Epanechnikov kernel (bandwidth on age of 6.5 years) and a local linear estimate ofthe conditional mean function. The age distribution of sample participants is mostconcentrated around 40 years of age, which is where we have the most data and have themost power to be able to identify a significant treatment effect. However, the largesttreatment effects of the retailer’s advertising appear around age 45 and are large andsubstantial until age 70, at which point a lack of statistical significance bars furtherconclusions. So, it would seem that the treatment effect is not correlated with ad 17
  19. 19. frequency as we might have expected with an unconditionally increasing marginal effectof an ad—age is more important for the effectiveness of the ads than just frequency.When the treatment effect is decomposed by gender, there is an insignificant, difference.Men are slightly more affected by the advertising campaign than women, with the three-week campaign’s DID estimate being R$0.201 (0.066) for men and R$0.141 (0.060) forwomen. However, when we examine the nonparametric plots of gender interacted withage in Figure 16 and Figure 17, we see that there is somewhat different behavior betweenthe two estimates. The level of pointwise significance in the difference between the twohas not been determined, but the point estimates are still of interest. Women are suddenlyhighly affected for ages between 50 and 70, while men, which compose only 40.4% ofthe sample, are more gradually and less significantly affected starting around age 40while increasing until age 70. Further, in terms of per ad average effectiveness, for all agegroups women on average view significantly fewer ads (35.5 v. 45.6 ads with p=0.000for the difference). Thus, for those age groups where there is a large and significanttreatment effect, the advertising is much more cost-effective.The nonparametric estimates of the treatment effect interacted with age and gender showsignificant differences in the effectiveness of the advertising for different age groups andsuggestive, although not stark, differences between men and women.D. Probability of Purchase versus Basket SizeSo far, our analysis has focused on advertising’s effects on the average purchase amountper person, including those who made purchases as well as those who did not. We candecompose the effects of advertising into two separate channels of interest to retailers:the effect on the probability of a transaction and the effect on “basket size,” or purchaseamount conditional on a transaction. To provide some base numbers for reference, duringthe three-week period after the start of the campaign, individuals treated with ads had a6.48% probability of a transaction, and the average basket size was R$40.72 for thosewho purchased. The product of these two numbers gives the average (unconditional)purchase amount of R$2.64 per person. We reproduce these numbers in the last columnof Table 10 for comparison to our treatment-effect results.In Table 10, the first column shows the estimates of the treatment effects on each variableof interest. As before, our treatment effects come from a difference in differences,comparing those treated with ads versus those untreated, using three weeks of data beforeand after the start of the campaign.First we investigate advertising’s impact on the probability of a transaction,17 with resultsshown in the first row of the table. We find an increase of 0.102% in the probability ofpurchase as a result of the advertising, and the effect is statistically significant (p=0.03).17 We include negative purchase amounts (net returns) as transactions in this analysis. Since we previouslyfound that advertising decreases the probability of a negative purchase amount, the effect measured herewould likely be larger if we restricted our analysis to positive purchase amounts. 18
  20. 20. This represents an increase of approximately 1.6% relative to the average probability of atransaction.Next, we consider the effect on basket size. Instead of computing this DID estimate viaregression using first-differenced sales as before, we use group means of basket size,paying careful attention to possible time-series correlation in order to compute aconsistent standard error.18 As shown in the second row of Table 10, the advertisingcampaign produced an increase in basket size of R$1.75 which is statistically significant(p=0.018). Compared with the baseline basket size of $40.72, this represents an increaseof approximately 4.5%.To summarize, we initially found that the treatment caused an increase of R$0.166 in theaverage (unconditional) purchase amount. This decomposes into an increase of 0.102% inthe probability of a transaction, as well as an increase of R$1.75 in the purchase amountconditional on a transaction. Thus, we estimate that about one-fourth of the treatmenteffect appears to be due to increases in the probability of a transaction, and about three-fourths due to increases in basket size.VII. ConclusionDespite the economic importance of the advertising industry, the effects of advertising onsales have been extremely difficult to quantify. In this study, we take a substantial stepforward in this measurement problem by conducting a large-scale field experiment thatsystematically varies advertising to a subject pool of over one million retail customers onYahoo! The randomized experiment allows us to establish causal effects of advertising onsales. Panel data on weekly individual transactions allows us to measure these effectsrather precisely, despite the fact that sales at this retailer have high variance and thisonline advertising campaign is just one of many factors that influence purchases. Thepanel nature of the data also illuminates how we would make vastly incorrectmeasurements of the effect of advertising on sales if we attempted to estimate theseeffects using non-experimental cross-section variation in advertising exposure.Our primary result is that retail advertising works! We find positive, sizeable, andpersistent effects of online retail advertising on retail sales. The persistence is striking: wefind measurable effects even several weeks after the last ad has been shown. In total, weestimate that the retailer gained incremental revenues more than ten times as large as theamount it spent on the online ads in this experiment.Though it may be tempting to assume that online advertising would havedisproportionately large effects on online retail sales, we find the reverse to be true. This18 When comparing the mean time-series difference for treated individuals to the mean time-seriesdifference for untreated individuals, we know those two means are independent, so standard errors arestraightforward. But when computing a difference in differences for four group means, we know we shouldexpect correlation between pre-campaign and post-campaign basket size estimates since some individualspurchase in both periods and may have serially-correlated sales. 19
  21. 21. particular retailer records 86% of its sales volume offline, and we estimate 93% of ourtreatment effect to occur in offline sales. Online advertising has a large effect on offlinesales.Furthermore, though clicks are a standard measure of performance in online-advertisingcampaigns, we find that online advertising has even more substantial effects on thosewho merely view the ads than on those who actually click them. Clicks are a goodpredictor of online sales, but not for offline sales. We decompose the total treatmenteffect to show that 78% of the lift in sales comes from those who view ads but do notclick them, while only 22% can be attributed to those who click.We find that the treatment effect of advertising is largest for those individuals whobrowsed Yahoo! enough to see between 25 and 100 ad impressions during a two-weekperiod. We also find that online advertising increases both the probability of purchase andthe average purchase amount, with about three-quarters of the treatment effect comingthrough increases in the average purchase amount. Finally, we find evidence suggestingthat the effects of having advertised to customers may be countercyclical, in the sensethat advertising’s impact is greatest in weeks when total sales are smallest.In future research, we hope to replicate these results with other retailers. We also wish toinvestigate related factors in online advertising, such as the value of targeting customerswith particular demographic or online-browsing-behavior attributes that an advertisermay think desirable. The ability to conduct a randomized experiment with a millioncustomers and to match individual-level sales and advertising data makes possibleexciting new measurements about the economic effects of advertising. We believe JohnWanamaker would be very interested in the results from this new frontier. 20
  22. 22. ReferencesAbraham, Magid. “The Off-Line Impact of Online Ads.” Harvard Business Review, April2008, p. 28.Abraham, Magid, and Len Lodish. “Getting the Most out of Advertising and Promotion.”Harvard Business Review, May-June 1990, pp. 50-60.Ackerberg, Daniel. “Empirically Distinguishing Informative and Prestige Effects ofAdvertising.” RAND Journal of Economics, vol. 32, no. 2, Summer 2001, pp. 316-333.Ackerberg, Daniel. “Advertising, Learning, and Consumer Choice in Experience-GoodMarkets: An Empirical Examination.” International Economic Review, vol. 44, no. 3,August 2003, pp. 1007-1040.Anderson, Eric, and Duncan Simester. “Dynamics of Retail Advertising: Evidence from aField Experiment.” Forthcoming, Economic Inquiry, 2008.Bagwell, Kyle. “The Economic Analysis of Advertising.” In Handbook of IndustrialOrganization, volume 3, Mark Armstrong and Robert Porter, eds. Amsterdam: ElsevierB.V., 2008, pp. 1701-1844.Berndt, Ernst R. The Practice of Econometrics: Classic and Contemporary. Reading,Massaschusetts: Addison-Wesley, 1991.Cameron, A. C. and P. K. Trivedi. Microeconometrics. Cambridge University Press,2005.Dorfman, Robert, and Peter O. Steiner. “Optimal Advertising and Product Quality.”American Economic Review, vol. 44, no. 5, December 1954, 826-836.Hu, Ye, Leonard M. Lodish, and Abba M. Krieger. “An Analysis of Real World TVAdvertising Tests: a 15-Year Update.” Journal of Advertising Research, vol. 47, no. 3,September 2007, pp. 341-353.Levitt, Steven, and John A. List. “Field Experiments in Economics: The Past, the Present,and the Future.” NBER Working Paper 14356, 2008.Lewis, Randall. “Data Recovery: Least Squares Consistency for Inexact Merges.”Working paper, MIT, 2008.Lodish, Leonard M., Magid Abraham, Stuart Kalmenson, Jeanne Livelsberger, BethLubetkin, Bruce Richardson, and Mary Ellen Stevens. “How T.V. Advertising Works: aMeta-Analysis of 389 Real World Split Cable T.V. Advertising Experiments.” Journal ofMarketing Research, vol. XXXII, May 1995a, pp. 125-139. 21
  23. 23. Lodish, Leonard M., Magid Abraham, Stuart Kalmenson, Jeanne Livelsberger, BethLubetkin, Bruce Richardson, and Mary Ellen Stevens. “A Summary of Fifty-Five In-Market Experiments of the Long-Term Effect of TV Advertising.” Marketing Science,vol. 14, no. 3, part 2, 1995, pp. G133-140.Schmalensee, Richard. The Economics of Advertising. Amsterdam: North-Holland, 1972. 22
  24. 24. Tables and Figures Table 1 – Summary Statistics for the Three Campaigns Figure 1 – Time Plot of Treatment Saturation Figure 2– Yahoo! Front Page with Large Rectangular Advertisement 23
  25. 25. Table 2 – Basic Summary Statistics for Campaign #1 Control Treatment% Female 59.5% 59.7%% Retailer Ad Views > 0 0.0% 63.7%% Yahoo Page Views > 0 76.4% 76.4%Mean Y! Page Views per Person 358 363Mean Ad Views per Person 0 25Mean Ad Clicks per Person 0 0.056% Ad Impressions Clicked (CTR) - 0.28%% People Clicking at Least Once - 4.59% Figure 3 - Ad Views Histogram 24
  26. 26. Table 3 – Weekly Sales SummaryFigure 4 - Offline and Online Weekly Sales 25
  27. 27. Table 4 - Three Week Treatment Effect Offline/Online Decomposition Before During Campaign Campaign Difference (2 weeks) (2 weeks) (During – Before) Mean Mean Mean Sales/Person Sales/Person Sales/PersonControl: R$ 1.95 R$ 1.84 -R$ 0.10 (0.04) (0.03) (0.05)Treatment: 1.93 1.89 -R$ 0.04 (0.02) (0.02) (0.03) Exposed to Retailer’s Ads: 1.81 1.81 R$ 0.00 (0.02) (0.02) (0.03) Not Exposed to Retailer’s Ads: 2.15 2.04 -R$ 0.10 (0.03) (0.03) (0.04) Figure 5 - Histogram of Campaign #1 Sales by Treatment and Control 26
  28. 28. Figure 6 - Difference between Treatment and Control Sales Histograms Table 5 - Preliminary Two-Week Treatment Effect Estimates All Individuals Excluding Page Views=0 Treatment-Control Difference R$ 0.053 R$ 0.078 (0.038) (0.045) Rescaled Effect on Treated 0.083 0.093 (0.059) (0.054)Figure 7 – Histogram of Difference in Three-Week Sales for Treated and Untreated Groups 27
  29. 29. Figure 8 – Difference in Treated and Untreated Three-Week Sales Histograms Table 6 – Difference-in-Differences Treatment Effect Estimate Two-Week Three-Week Difference in Difference R$ 0.102 R$ 0.166 (0.043) (0.052) 28
  30. 30. Table 7 – Weekly Summary of Effect on the Treated* For purposes of computing the treatment effect on the treated,we define "treated" individuals as having ever seen an ad in one ofthese campaigns up to that point in time.** Estimates for Campaign #3 involve an imperfect merge tocompute the difference in differences. See footnote 14. Figure 9 - Three Campaign Weekly Treatment Effect Time Plot 29
  31. 31. Table 8 - Three Campaign Results Summary Figure 10 - Weekly DID Specification Test 30
  32. 32. Figure 11 - Total Sales Histogram with Treatment Effect EstimatesFigure 12 - Plot and Regressions of Weekly Treatment Effect with Sales 31
  33. 33. Table 9 - Offline/Online Treatment Effect Decomposition Total Sales Offline Sales Online SalesAds Viewed R$ 0.166 R$ 0.155 R$ 0.011 [63.7% of Treatment Group] (0.052) (0.049) (0.016) Ads Viewed Not Clicked R$ 0.139 R$ 0.150 -R$ 0.010 [92.8% of Viewers] (0.053) (0.050) (0.016) Ads Clicked R$ 0.508 R$ 0.215 R$ 0.292 [7.2% of Viewers] (0.164) (0.157) (0.044) Figure 13 – Nonparametric Estimate of the Treatment Effect by Ad Viewing Outcome 32
  34. 34. Figure 14 - Nonparametric Estimate of the Average Campaign #1 Exposure by Age Figure 15 - Nonparametric Estimate of the Treatment Effect by Age 33
  35. 35. Figure 16 - Nonparametric Estimate of the Treatment Effect for Females Figure 17 - Nonparametric Estimate of the Treatment Effect for Males 34
  36. 36. Table 10 - Decomposition of Treatment Effect into Basket Size and Frequency Effects 3-Week DID Treatment Effect Treated Group Level* Pr(Transaction) 0.102% 6.48% (0.047%) Mean Basket Size R$ 1.75 R$ 40.72 (0.74)Revenue Per Person R$ 0.166 R$ 2.639 (0.052)* Levels computed for those treated with ads during Campaign #1, using threeweeks of data following the start of the campaign. 35

×