Breaking Wanamaker’s CurseRandomized Experiments inAdvertisingDuncan Watts                           David ReileyPrincipal...
Half the money I spendon advertising is wasted.The trouble is,I don’t knowwhich half.              John Wanamaker (1838-19...
That Was 100 Years Ago• John Wanamaker died in 1922  – Before quantum mechanics, penicillin, radar, lasers,    jet engines...
Two Responses1. “Measuring advertising effectiveness is   impossible!”  – We know that advertising works, but its effects ...
The “It’s Impossible” Response• How is possible both to “know” that advertising works and  also that its effects can’t be ...
The “Already Done” Response• True that Advertising is increasingly metrics-driven  – Neilsen, Comscore, survey and panel d...
Correlation vs. Causation• “Everyone knows that correlation is not causation”• But it is remarkably easy to get them mixed...
Problems with Causal Inferencefrom Observational Data• MMM exploits observational variation in advertising  levels to esti...
Selection Problems• HBR article by the founder and president of ComScore  (Abraham, 2008) illustrates the state of the art...
What’s the Solution?• Randomized, controlled experiments  – Randomly assign everyone in target population to “treatment”  ...
Can it Be Done?• Experiments are hard to design properly, and harder to  implement  – Often hard to control who sees an ad...
At Yahoo! Labs, We are Putting theExperimental Method to Good Use• Several different clients have agreed to careful  exper...
Case Study: A Retailer Found Large Effects ofYahoo! Display Ads on Existing Customers• This retailer keeps careful records...
Descriptive Statistics Indicate a ValidTreatment-Control Randomization                                    Control   Treatm...
Experimental Differences Show aPositive Increase in Sales Due to the Ads                                                 D...
Suppose We Had No Experiment, and Just Compared Spending by Those Who Did or Did Not See Ads                              ...
Pre-Campaign Data Emphasize that theNon-Experimental Sales Differences HaveNo Causal Relationship to the Ad Exposures     ...
Ad Exposures Appear to Have Prevented a NormalDecline in Sales During this Time Period                                    ...
Our Difference-in-Difference Estimate Yields aStatistically and Economically Significant Effect• Estimated effect per cust...
What Happens After the Two-WeekCampaign is Over?• Positive effects during the campaign could be  followed by  – Negative e...
We See a Positive Impact on Sales in theWeek After the Campaign Ends• Previous two-week estimate     – R$0.102 (0.043) per...
We Break Down the ExperimentalLift by Sales Channel    3-Week Sales Lift/Person:                     Total    Offline     ...
Do We Capture the Effects of Ads byMeasuring Only Clicks? No.    3-Week Sales Lift/Person:                     Total    Of...
The Increase Appears to Consist of About ¼ Increasein Transactions, ¾ Increase in Basket Size• Prob(transaction) increases...
Wasted Half? Older Customers’ PurchasesResponded More than Younger Ones’Kernel smoothing of treatment-control difference, ...
We Designed a Second Study to Measurethe Impact of Frequency• This time, we have 3 million matched users• Two campaigns in...
Again, Randomization Looks Good
Frequency Has Surprisingly High Marginal ImpactWhen Going from 17 to 34 Ads Per Person    Purchases During 2 Weeks        ...
The Impact of the Ads is Greatest for ThoseWho Live Within Two Miles of a Retail Store         $0.50 avg. effect for all c...
At Yahoo!, We Can Also Measure theIncrease in Searches Due to Display Ads        In the treatment group, 1300 searches tak...
Without an Experiment, We WouldHave Overestimated The Effects     True incremental impact: 560 searches, not 1314
Correlated Online Behaviors can Lead toOverestimates of the Effects of Advertising• Activity bias: People who are doing an...
We Intend to Do Lots MoreAd-Effectiveness Experiments at Yahoo!• Currently building an experimentation platform to  make t...
Lots of Yahoos to Thank, Including• Randall Lewis       • Garrett Johnson• Taylor Schreiner    • Justin Rao• Valter Sciari...
The Problem with Obvious                • Paradoxically, our intuition for human                  behavior may actually im...
Breaking wanamaker's curse randomized experiments in advertising
Upcoming SlideShare
Loading in …5
×

Breaking wanamaker's curse randomized experiments in advertising

558 views

Published on

At the Advertising Research Foundation’s 2011 Annual re:think convention, Duncan Watts & David Reilly, Principal Research Scientists at Yahoo! Labs, gave a presentation on Randomized Experiments in Advertising, entitled Breaking Wanamaker’s Curse. The presentation shows how to use randomized experiments to measure advertising effectiveness.

Published in: Business, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
558
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Breaking wanamaker's curse randomized experiments in advertising

  1. 1. Breaking Wanamaker’s CurseRandomized Experiments inAdvertisingDuncan Watts David ReileyPrincipal Research Scientist Principal Research ScientistYahoo! Labs Yahoo! Labs
  2. 2. Half the money I spendon advertising is wasted.The trouble is,I don’t knowwhich half. John Wanamaker (1838-1922)
  3. 3. That Was 100 Years Ago• John Wanamaker died in 1922 – Before quantum mechanics, penicillin, radar, lasers, jet engines, space travel, computers, the Internet…• How is that we have made such dramatic progress in physical and engineering sciences, yet still don’t know how to measure Ad Effectiveness?
  4. 4. Two Responses1. “Measuring advertising effectiveness is impossible!” – We know that advertising works, but its effects are too long-term, intangible, etc. to measure2. “We do know how to measure ad effectiveness, and do it all the time!” – Metrics have improved vastly, as have statistical (marketing mix) models
  5. 5. The “It’s Impossible” Response• How is possible both to “know” that advertising works and also that its effects can’t be measured? – Seem like logically contradictory statements• Resolution seems to be that although effects of individuals campaigns are “impossible” to measure, seems obvious that advertising as a whole “works”• But this is still problematic – We also know that at some level medical science “works” but this doesn’t mean we don’t try to measure the effectiveness of particular procedures, drugs etc. – Also not clear that effectiveness of campaigns can’t be measured• Why not apply the same standards of evidence to advertising that we do to medical (and other) science?
  6. 6. The “Already Done” Response• True that Advertising is increasingly metrics-driven – Neilsen, Comscore, survey and panel data, market research, server data, sales data• Also true that statistical modeling has become more sophisticated• But observational data alone only reveals correlations between sales and advertising, not whether advertising causes sales – Statistical models try to “identify” causal effects, but this is notoriously hard to do
  7. 7. Correlation vs. Causation• “Everyone knows that correlation is not causation”• But it is remarkably easy to get them mixed up in practice – If Mary went on a diet and also lost 20lbs, the diet caused the weight loss, right? • Well maybe, but she was probably doing other things as well (watching what she eats in general, exercising, etc.) • Nevertheless, we focus on the diet and tell a plausible story• True of human behavior in particular – We have so much intuition, we can always come up with a plausible story about why someone did what they did – Tempting to infer causality, when in reality all we know is that “X happened and then Y happened”
  8. 8. Problems with Causal Inferencefrom Observational Data• MMM exploits observational variation in advertising levels to estimate a model – Advertising levels must vary so that we can estimate slope coefficients• Key question: what causes the variation in advertising?• Reverse causality? – Advertisers setting their budget based on previous year’s sales• Correlations will be strong even if ads have no effect – Big product launches attract large ad campaigns – Advertisers spend more during holidays
  9. 9. Selection Problems• HBR article by the founder and president of ComScore (Abraham, 2008) illustrates the state of the art: – Compares those who saw an online ad with those who didn’t. Measures huge effects for search ads, smaller for display ads. – Potential problem: the two samples do not come from the same population – Example: Who sees an ad for eTrade on Google? • Those who search for “online brokerage” and similar keywords • Does the ad actually cause the difference in sales?• In general, people who see a given ad not the same as people who don’t see it – Because they read certain magazines, browse certain website, search for certain terms, or visit certain places – In focusing on most likely consumers, targeting also makes it more difficult to identify the “marginal” consumer
  10. 10. What’s the Solution?• Randomized, controlled experiments – Randomly assign everyone in target population to “treatment” and “control” groups – Only treatment group is exposed to campaign – Measure the difference in outcomes• Randomization is the key – Only by randomizing treatment can the various confounding factors be eliminated• In medical science, this is uncontroversial – Observational studies also suffer from selection problem • People who select into a particular treatment or behavior are different from people who don’t – Randomized trials therefore the gold standard
  11. 11. Can it Be Done?• Experiments are hard to design properly, and harder to implement – Often hard to control who sees an ad – Also hard to measure outcomes, which may be distant in space and time from treatment• However, it can be done – Split cable TV Experiments in 1980’s/90’s – Direct mail marketing – Increasingly possible in Online Advertising• David Reiley, Randall Lewis and colleagues at Yahoo! Labs have conducted series of groundbreaking field experiments – Over to David…
  12. 12. At Yahoo! Labs, We are Putting theExperimental Method to Good Use• Several different clients have agreed to careful experiments using a control group: – Three retailers. (Offline sales data!) – Several online service providers – Internal Yahoo! properties• A variety of outcomes can be measured: – Online sales or other conversions – Offline sales, in special cases – Survey questions (brand affinity, etc.) – Online searches
  13. 13. Case Study: A Retailer Found Large Effects ofYahoo! Display Ads on Existing Customers• This retailer keeps careful records, attributing >90% of in-store purchases to the correct individual customer• We found 1.6 million customers who matched (name and address, either email or snail-mail) between the databases of the retailer and Yahoo!• 80% of matched customers assigned to the treatment group – Targeted with retail-image ad campaigns from the retailer• 20% assigned to the control group – Do not see these retailer ads• Ad campaigns are “Run of Network” on Yahoo!• Following the online ad campaigns, we received both online and in-store sales data: for each week, for each person – Third party de-identifies observations to protect customer identities – Retailer disguises all sales amounts (R$) with a scalar multiple of USD
  14. 14. Descriptive Statistics Indicate a ValidTreatment-Control Randomization Control Treatment % Female 59.5% 59.7% % Retailer Ad Views > 0 0% 63.7% % Y! Page Views > 0 76.4% 76.4% Mean Y! Page Views Per Person 358 363 % Ad Impressions Clicked (CTR) - 0.28% % Viewers Clicking At Least Once - 7.21%
  15. 15. Experimental Differences Show aPositive Increase in Sales Due to the Ads During Mean Sales/Person Campaign (2 wks) R$1.84 Control Group (0.03) 1.89 Treatment Group (0.02) 95% C.I. for treatment is R$0.05±0.07. (For treatment effect on the treated: R$0.08±0.11.)(Standard errors in parentheses.)
  16. 16. Suppose We Had No Experiment, and Just Compared Spending by Those Who Did or Did Not See Ads During Mean Sales/Person Campaign (2 wks) R$1.84 Control Group (0.03) 1.89 Treatment Group (0.02) 1.81 Exposed (64% of TG) (0.02) 2.04 Unexposed (36% of TG) (0.03)We would conclude that ads decrease sales by R$0.23 per person! Not comparing apples to apples here. (Standard errors in parentheses.)
  17. 17. Pre-Campaign Data Emphasize that theNon-Experimental Sales Differences HaveNo Causal Relationship to the Ad Exposures Before During Mean Sales/Person Campaign Campaign (2 wks) (2 wks) R$1.84 Control Group (0.03) 1.89 Treatment Group (0.02) 1.81 1.81 Exposed (64% of TG) (0.02) (0.02) 2.15 2.04 Unexposed (36% of TG) (0.03) (0.03) People who browse enough to see the ads also have a lower baseline propensity to purchase from the retailer!(Standard errors in parentheses.)
  18. 18. Ad Exposures Appear to Have Prevented a NormalDecline in Sales During this Time Period Before During Mean Sales/Person Difference Campaign Campaign (During-Before) (2 wks) (2 wks) R$1.95 R$1.84 -R$0.10 Control Group (0.04) (0.03) (0.05) 1.93 1.89 Treatment Group (0.02) (0.02) 1.81 1.81 0.00 Exposed (64% of TG) (0.02) (0.02) (0.03) 2.15 2.04 -0.10 Unexposed (36% of TG) (0.03) (0.03) (0.04) Control group falls. Untreated group falls. Treated group holds constant.(Standard errors in parentheses.)
  19. 19. Our Difference-in-Difference Estimate Yields aStatistically and Economically Significant Effect• Estimated effect per customer of viewing ads – Mean = R$ .102, SE = R$ .043• Estimated sales impact for the retailer – R$83,000 ± 70,000 • 95% confidence interval • Based on 814,052 treated individuals • Compare with cost of about R$25,000 • 325% increase in revenue relative to cost• Note the wide confidence interval. But it’s actually much narrower than that for a “successful” IRI BehaviorScan test – More like R$83,000±190,000
  20. 20. What Happens After the Two-WeekCampaign is Over?• Positive effects during the campaign could be followed by – Negative effects (intertemporal substitution) – Equal sales (short-lived effect of advertising) – Higher sales (persistence beyond the campaign)• We can distinguish between these hypotheses by looking at the week following the two weeks of the campaign
  21. 21. We See a Positive Impact on Sales in theWeek After the Campaign Ends• Previous two-week estimate – R$0.102 (0.043) per person• Estimate for third week – R$0.61 (0.024) per person – As large as the effect per week during the campaign• Including the third week, the total impact of the ads becomes – R$135,000±85,000 – Compared with cost of R$25,000• Extending out five weeks, the total looks as high as R$250,000±190,000 (compared with cost of R$33,000)(Standard errors in parentheses.)
  22. 22. We Break Down the ExperimentalLift by Sales Channel 3-Week Sales Lift/Person: Total Offline Online R$0.166 R$0.155 R$0.011 Viewed Ads (0.05) (0.05) (0.02) 93% of the effect occurs in stores!Three-week difference-in-difference estimator.
  23. 23. Do We Capture the Effects of Ads byMeasuring Only Clicks? No. 3-Week Sales Lift/Person: Total Offline Online R$0.166 R$0.155 R$0.011 Viewed Ads (0.05) (0.05) (0.02) 0.508 0.215 0.292 Clicked [7%] (0.02) (0.02) (0.03) 0.139 0.150 -0.010 Didn’t Click [93%] (0.05) (0.05) (0.02) 78% of the total lift comes from viewers who never clicked! With an experiment, no attribution model is requiredThree-week difference-in-difference estimator.
  24. 24. The Increase Appears to Consist of About ¼ Increasein Transactions, ¾ Increase in Basket Size• Prob(transaction) increases by 0.1% (0.05%) – Baseline amount = 6.5% – Percentage increase = 1.5%• Mean basket size increases by R$1.75 (0.74) – Baseline amount = R$41. – Percentage increase = 4.2%• Both effects are statistically significant at the 5% level
  25. 25. Wasted Half? Older Customers’ PurchasesResponded More than Younger Ones’Kernel smoothing of treatment-control difference, Epanechnikov kernel, bandwidth ~2 years.
  26. 26. We Designed a Second Study to Measurethe Impact of Frequency• This time, we have 3 million matched users• Two campaigns in two weeks• Three equal-sized treatment groups – Control (no ads) – Half frequency (17 impressions/person on average) – Full frequency (34 impressions/person on average)• This time, we deliver Y! house ads as control impressions – Mark hypothetical views of the Control group – Also mark views the Half group would have seen in the Full group• Again we see in-store and online transaction data for each customer during the experiment• Real US dollars this time
  27. 27. Again, Randomization Looks Good
  28. 28. Frequency Has Surprisingly High Marginal ImpactWhen Going from 17 to 34 Ads Per Person Purchases During 2 Weeks Control Half Full $17.62 $17.98 $18.22 Mean purchase amount (0.17) (0.17) (0.17) 0.36 0.60 Difference from Control (0.24) (0.24) Doubling the frequency increases sales by 50% more. Diminishing returns not as high as we might have thought!We include 60% of users,, all those who viewed either a treatment or a control ad.This aggregates effects of both campaigns. Second week showed much larger effects than the first.
  29. 29. The Impact of the Ads is Greatest for ThoseWho Live Within Two Miles of a Retail Store $0.50 avg. effect for all customers: 7X ROAS.$3.00 avg. effect for those within 2 miles of a store: 36X ROAS!
  30. 30. At Yahoo!, We Can Also Measure theIncrease in Searches Due to Display Ads In the treatment group, 1300 searches take place within 5 minutes of an ad impression in our retail ad campaign
  31. 31. Without an Experiment, We WouldHave Overestimated The Effects True incremental impact: 560 searches, not 1314
  32. 32. Correlated Online Behaviors can Lead toOverestimates of the Effects of Advertising• Activity bias: People who are doing an activity online are much more likely to be doing other online activities at around the same time• Lewis, Rao, and Reiley (2011), “Here, There, and Everywhere,” documents several examples of activity bias – Impact of display ads on search queries for an advertiser’s brand – Impact of display ads on online conversions (new account applications) – Impact of Y! video ads shown on Amazon Mechanical Turk for Y! page views• In each case, failing to use an experiment gives us an overestimate of true causal effects
  33. 33. We Intend to Do Lots MoreAd-Effectiveness Experiments at Yahoo!• Currently building an experimentation platform to make the process easy and scalable• We’re pleased to introduce Ken Mallon and AdLabs, an entire unit devoted to measuring effectiveness• We’re looking for clients interested in using experiments to measure online ad effectiveness – Willing to share online or offline conversion data – Willing to give up reach in order to invest in information about what works best • Targeting • Creative • Frequency • Etc.
  34. 34. Lots of Yahoos to Thank, Including• Randall Lewis • Garrett Johnson• Taylor Schreiner • Justin Rao• Valter Sciarillo • Ken Mallon• Meredith Gordon • Erin Carlson• Christine Turner • Rick Grimes• Iwan Sakran • Melissa Chickering• Sergiy Matusevych • Jim Zepp
  35. 35. The Problem with Obvious • Paradoxically, our intuition for human behavior may actually impede our understanding of it • We can always imagine how advertising will affect people – Not the same as knowing how it affects people • Similar problems arise in business, government and science as well – “It’s not rocket science” – Ironic, because we’ve made more progress in rocket science since Wanamaker’s day than in dealing with human affairs • Read more at everythingisobvious.com

×