Computational Social Science, Lecture 10: Online Experiments

1,690 views

Published on

Published in: Technology, Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,690
On SlideShare
0
From Embeds
0
Number of Embeds
1,086
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Computational Social Science, Lecture 10: Online Experiments

  1. 1. Experimental DesignSergei VassilvitskiiColumbia UniversityComputational Social ScienceApril 5, 2013Thursday, April 25, 13
  2. 2. Sergei VassilvitskiiMeasurement2“Half the money I spend on advertisingis wasted; the trouble is, I don’t knowwhich half.”- John WanamakerThursday, April 25, 13
  3. 3. Sergei VassilvitskiiMeasurement3“Half the money I spend on advertisingis wasted; the trouble is, I don’t knowwhich half.”- John Wanamaker, 1875Thursday, April 25, 13
  4. 4. Sergei VassilvitskiiHelping John:4Thursday, April 25, 13
  5. 5. Sergei VassilvitskiiHelping John:Idea 1: Measure the final effect:– Track total store sales, compare to advertising budget5Thursday, April 25, 13
  6. 6. Sergei VassilvitskiiIdea 1:Idea 1: Measure the final effect:– Track total store sales, compare to advertising budgetFindings:– Total sales typically higher after intense advertising6Thursday, April 25, 13
  7. 7. Sergei VassilvitskiiIdea 1:Idea 1: Measure the final effect:– Track total store sales, compare to advertising budgetFindings:– Total sales typically higher after intense advertisingProblems:– Stores advertise when people tend to spend– Christmas shopping periods– Travel during the summer– Ski gear in winter, etc.7Thursday, April 25, 13
  8. 8. Sergei VassilvitskiiCorrelation vs. Causation8Thursday, April 25, 13
  9. 9. Sergei VassilvitskiiIdea 1Within Subject pre-test, post-test design.9Thursday, April 25, 13
  10. 10. Sergei VassilvitskiiIdea 2“Measuring the online sales impact of an online ad or apaid-search campaign -- in which a company pays to haveits link appear at the top of a page of search results -- isstraightforward: We determine who has viewed the ad, then compare onlinepurchases made by those who have and those who havenot seen it." 10Thursday, April 25, 13
  11. 11. Sergei VassilvitskiiIdea 2“Measuring the online sales impact of an online ad or apaid-search campaign -- in which a company pays to haveits link appear at the top of a page of search results -- isstraightforward: We determine who has viewed the ad, then compare onlinepurchases made by those who have and those who havenot seen it." – Magid Abraham, CEO, President & Co-Founder of ComScore, in HBRarticle (2008)11Thursday, April 25, 13
  12. 12. Sergei VassilvitskiiIdea 2Measure the difference between people who see ads andwho don’t.12Thursday, April 25, 13
  13. 13. Sergei VassilvitskiiIdea 2Measure the difference between people who see ads andwho don’t.Findings:– People who see the ads are more likely to react to them13Thursday, April 25, 13
  14. 14. Sergei VassilvitskiiIdea 2Measure the difference between people who see ads andwho don’t.Findings:– People who see the ads are more likely to react to themProblems:– Ads are finely targeted. These are exactly the people who are likely toclick!– Don’t advertise cars in fashion magazines.– Even more extreme online -- which ads are shown depends on thepropensity of the user to click on the ad.14Thursday, April 25, 13
  15. 15. Sergei VassilvitskiiIdea 3Matching:– Compare people in a group who saw an ad with people who aresimilar, but didn’t see an ad, but are otherwise “the same.”15Thursday, April 25, 13
  16. 16. Sergei VassilvitskiiIdea 3Matching:– Compare people in a group who saw an ad with people who aresimilar, but didn’t see an ad, but are otherwise “the same.”Problems:– Hard to define “the same.” Beware of lurking variables.16Thursday, April 25, 13
  17. 17. Sergei VassilvitskiiAd Wear-out17What is the optimal number of times to show an ad?Thursday, April 25, 13
  18. 18. Sergei VassilvitskiiCase Study: Ad Wear-outFew:– Don’t want user to be annoyed– No need to waste money if adis ineffectiveMany:– Make sure the user sees it– Reinforce the message18What is the optimal number of times to show an ad?Thursday, April 25, 13
  19. 19. Sergei VassilvitskiiObservational StudyLook through the data:– Find the users who saw the ad once– Find the users who saw the ad many times19Thursday, April 25, 13
  20. 20. Sergei VassilvitskiiObservational StudyLook through the data:– Find the users who saw the ad once– Find the users who saw the ad many timesMeasure Revenue for the two sets of users:–Conclusion: Limit the number of impressions20Thursday, April 25, 13
  21. 21. Sergei VassilvitskiiCorrelationsWhy did some users only see the ad once?– They must use the web differently– : Sign on once a week to check email– : Are always online21Thursday, April 25, 13
  22. 22. Sergei VassilvitskiiCorrelationsWhy did some users only see the ad once?– They must use the web differently– : Sign on once a week to check email– : Are always onlineCorrect conclusion:– People who visit the homepage often are unlikely to click on ads– Have not measured the effect of wear-out22Thursday, April 25, 13
  23. 23. Sergei VassilvitskiiIdea 3Matching:– Compare people in a group who saw an ad with people who aresimilar, but didn’t see an ad, but are otherwise “the same.”Problems:– Hard to define “the same.” Beware of lurking variables.23Thursday, April 25, 13
  24. 24. Sergei VassilvitskiiSimpson’s ParadoxKidney Stones [Real Data].You have Kidney stones. There are two treatments A & B.– Empirically, treatment A is effective 78% of time– Empirically, treatment B is effective 83% of time– Which one do you chose?24Thursday, April 25, 13
  25. 25. Sergei VassilvitskiiSimpson’s ParadoxKidney Stones [Real Data].You have Kidney stones. There are two treatments A & B.Digging into the data you see:If they are large:– Treatment A is effective 73% of the time– Treatment B is effective 69% of the timeIf they are small:– Treatment A is effective 93% of the time– Treatment B is effective 87% of the time25Thursday, April 25, 13
  26. 26. Sergei VassilvitskiiSimpson’s ParadoxIf they are large:– Treatment A is effective 73% of the time– Treatment B is effective 69% of the timeIf they are small:– Treatment A is effective 93% of the time– Treatment B is effective 87% of the timeOverall:– Treatment A is effective 78% of the time– Treatment B is effective 83% of the time26Thursday, April 25, 13
  27. 27. Sergei VassilvitskiiSimpson’s Paradox Summary Stats27A BSmall 81/87 (93%) 234/270 (87%)Large 192/263 (73%) 55/80 (69%)Combined 273/350 (78%) 289/350 (83%)Thursday, April 25, 13
  28. 28. Sergei VassilvitskiiIdea 3Matching:– Compare people in a group who saw an ad with people who aresimilar, but didn’t see an ad, but are otherwise “the same.”Problems:– Hard to define “the same.” Beware of lurking variables.– Simpson’s Paradox28Thursday, April 25, 13
  29. 29. Sergei VassilvitskiiGetting at CausationRandomized, Controlled Experiments.– Select a target population– Randomly decide whom to show the ad– Subjects cannot influence whether they are in the treatment or controlgroups29Thursday, April 25, 13
  30. 30. Sergei VassilvitskiiMeasuring Wear Out30Parallel UniverseThursday, April 25, 13
  31. 31. Sergei VassilvitskiiMeasuring Wear Out31Parallel UniverseControl Treatment++Thursday, April 25, 13
  32. 32. Sergei VassilvitskiiMeasuring Wear Out32Parallel UniverseControl TreatmentControl Treatment++Thursday, April 25, 13
  33. 33. Sergei VassilvitskiiCreating Parallel UniversesWhen user first arrives:– Check browser cookie, assign to control or treatment group– Control group: shown PSA– Treatment group: shown ad– Treatment the same on repeated visits33Thursday, April 25, 13
  34. 34. Sergei VassilvitskiiCreating Parallel UniversesWhen user first arrives:– Check browser cookie, assign to control or treatment group– Control group: shown PSA– Treatment group: shown ad– Treatment the same on repeated visitsAdvertising Effects:– Positive !– But smaller than reported through observational studies34Thursday, April 25, 13
  35. 35. Sergei VassilvitskiiOnline ExperimentsAdvantages:35Thursday, April 25, 13
  36. 36. Sergei VassilvitskiiOnline ExperimentsAdvantages:– Can reach tens of millions of people!• Can estimate very small effects. Lewis et al., "Here, There, and Everywhere:Correlated Online Behaviors Can Lead to Overestimates of the Effects ofAdvertising." (WWW 2011). Estimate effects of 0.01%!36Thursday, April 25, 13
  37. 37. Sergei VassilvitskiiOnline ExperimentsAdvantages:– Can reach tens of millions of people!• Can estimate very small effects. Lewis et al., "Here, There, and Everywhere:Correlated Online Behaviors Can Lead to Overestimates of the Effects ofAdvertising." (WWW 2011). Estimate effects of 0.01%!– Can be relatively cheap (Mechanical Turk)37Thursday, April 25, 13
  38. 38. Sergei VassilvitskiiOnline ExperimentsAdvantages:– Can reach tens of millions of people!• Can estimate very small effects. Lewis et al., "Here, There, and Everywhere:Correlated Online Behaviors Can Lead to Overestimates of the Effects ofAdvertising." (WWW 2011). Estimate effects of 0.01%!– Can be relatively cheap– Can be recruit diverse subjects• “20 students in a large Midwestern university.” Try to avoid subjects from WEIRDsocieties (Western, Educated, Industrialized, Rich, and Democratic).38Thursday, April 25, 13
  39. 39. Sergei VassilvitskiiWEIRD PeopleWhich line is longer?– Henrich, Joseph; Heine, Steven J.; Norenzayan, Ara (2010) : Theweirdest people in the world?, Working Paper Series des Rates fürSozialund Wirtschaftsdaten39Thursday, April 25, 13
  40. 40. Sergei VassilvitskiiWEIRD People40Thursday, April 25, 13
  41. 41. Sergei VassilvitskiiOnline ExperimentsAdvantages:– Can reach tens of millions of people!• Can estimate very small effects.– Can be relatively cheap– Can be recruit diverse subjects• “20 students in a large Midwestern university.” Try to avoid subjects from WEIRDsocieties (Western, Educated, Industrialized, Rich, and Democratic).– Access: subjects in other countries, geographically diverse– Can be quick41Thursday, April 25, 13
  42. 42. Sergei VassilvitskiiOnline ExperimentsAdvantages:– Can reach tens of millions of people!• Can estimate very small effects.– Can be relatively cheap– Can be recruit diverse subjects• “20 students in a large Midwestern university.” Try to avoid subjects from WEIRDsocieties (Western, Educated, Industrialized, Rich, and Democratic).– Access: subjects in other countries, geographically diverse– Can be quickChallenges:– Limited choice in range of treatments (no MRI studies)– Do people behave differently offline?42Thursday, April 25, 13
  43. 43. Sergei VassilvitskiiExternal ValidityMajor Challenge in all lab experiments:– Virtual and physical labs– Do findings hold outside the lab?Enter:– Natural Experiments43Thursday, April 25, 13
  44. 44. Sergei VassilvitskiiNatural ExperimentsThe experimental condition:– Is not decided by the experimenter– But is exogenous (subjects have no effect on the results)44Thursday, April 25, 13
  45. 45. Sergei VassilvitskiiCase Study: Ad-wear outBack to Ad-wear out.Natural Experiment:– When there were two competing campaigns, the Yahoo! ad serverdecided which campaign to show at random!– This was by engineering design -- both campaigns got an equal shareof pageviews. (Less complex, easy to distribute than a round robinsystem)45Few:– Don’t want user to be annoyed– No need to waste money if ad isineffectiveMany:– Make sure the user sees it– Reinforce the messageThursday, April 25, 13
  46. 46. Sergei VassilvitskiiCase Study: Ad-wear outNatural Experiment:– When there were two competing campaigns, the Yahoo! ad serverdecided which campaign to show at random!– This was by engineering design -- both campaigns got an equal shareof pageviews. (Less complex, easy to distribute than a round robinsystem)Experiments:– Compare behavior of people who saw the same total number of ads,but different number of each campaign.46Thursday, April 25, 13
  47. 47. Sergei VassilvitskiiCase Study: Ad-wear out47Yes:– Some advertisements see a 5x drop in click-through rate after thefirst exposure– These typically have very high click-through ratesNo:– Others see no decrease in click-through rate even after ten exposures– Have lower, but steady click-through ratesThursday, April 25, 13
  48. 48. Sergei VassilvitskiiCase Study 2: YelpDoes a higher Yelp Rating lead to higher revenue?How to do the experiment?48Thursday, April 25, 13
  49. 49. Sergei VassilvitskiiCase Study 2: YelpDoes a higher Yelp Rating lead to higher revenue?How to do the experiment?– Observational -- no causality.– Control -- deception.– Natural?49Thursday, April 25, 13
  50. 50. Sergei VassilvitskiiCase Study 2: YelpDoes a higher Yelp Rating lead to higher revenue?Natural Experiment:– Yelp rounds ratings to the nearest half star.– 4.24 becomes 4 stars, 4.26 is 4.5 stars50Thursday, April 25, 13
  51. 51. Sergei VassilvitskiiCase Study 2: YelpNatural Experiment:– Yelp rounds ratings to the nearest half star.– 4.24 becomes 4 stars, 4.26 is 4.5 starsData:– Raw ratings from Yelp– Restaurant revenue (from tax records)51Thursday, April 25, 13
  52. 52. Sergei VassilvitskiiCase Study 2: YelpNatural Experiment:– Yelp rounds ratings to the nearest half star.– 4.24 becomes 4 stars, 4.26 is 4.5 starsData:– Raw ratings from Yelp– Restaurant revenue (from tax records)– Finding: a one star increase leads to a 5-9% increase in revenue.52Thursday, April 25, 13
  53. 53. Sergei VassilvitskiiCase Study 3: BadgesHow do Badges influence user behavior?Specifically:– The “epic” badge on stackoverflow.– Awarded after hitting the maximum number of points (through posts,responses, etc.) on 50 distinct days.53Thursday, April 25, 13
  54. 54. Sergei VassilvitskiiCase Study 3: BadgesHow do Badges influence user behavior?Specifically:– The “epic” badge on stackoverflow.– Awarded after hitting the maximum number of points (through posts,responses, etc.) on 50 distinct days.Experimental Design:– Within subject pre-post test (again)– Look at user behavior before/after receiving badge– Averaged over different user, different timings, (hopefully) all otherfactors.54Thursday, April 25, 13
  55. 55. Sergei VassilvitskiiCase Study 3: BadgesResults:55Thursday, April 25, 13
  56. 56. Sergei VassilvitskiiOverallExperimental Design is hard!– Be extra skeptical in your analyses. Lots of spurious correlationsExperiments:– Natural and Controlled are best way to measure effectsObservational Data:– Sometimes best you can do– Can lead interesting descriptive insights– But beware of correlations!56Thursday, April 25, 13

×