Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

3,157 views

Published on

From the SMX Advanced Conference in Seattle, Washington, June 22-23, 2016. SESSION: Lies, Damned Lies, & Search Marketing Statistics. PRESENTATION: Lies, Damned Lies, & Search Marketing Statistics - Given by Adria Kyne, @adriak - Vistaprint, SEO Manager, North America and AU/NZ. #SMX #12C3

Published in: Marketing
  • Be the first to comment

Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

  1. 1. #SMX #12C3 @AdriaK How to Avoid the First Two When Producing the Latter Lies, Damned Lies, and Search Marketing Statistics Adria Kyne Vistaprint
  2. 2. #SMX #12C3 @AdriaK Problems • Using samples that are too small • Using significance as a stopping point for a test Solutions • More rigor with fixed-sample tests • Using sequential sampling tests • Bayesian testing Bonus Pro Tip for achieving valid samples Today’s Topics
  3. 3. #SMX #12C3 @AdriaK 1) Make sure that we understand what actually happened 2) Be sure that we can use these results to predict the future What is the Whole Point of This Anyway?
  4. 4. #SMX #12C3 @AdriaK 1. We want to know whether the variation is better, worse, or the same as the original. 2. We don’t want to see a positive outcome that isn’t really there— a false positive or Type I error 3. We don’t want to miss a positive outcome—a Type II error. Basics of Hypothesis Testing
  5. 5. #SMX #12C3 @AdriaK Your product page has an average 2.0% CR. You make a bunch of tweaks to the design, and after 30,000 visits, your CR is 2.25%. You think you’re a genius, and so you tell your boss. Score! #1 A Common (Sad) Story
  6. 6. #SMX #12C3 @AdriaK At the end of the month, your revenue is no higher. You look bad. The change you saw was not “significant,” because your sample size wasn’t big enough. Yes, 30,000 visits was not enough. You spoke too soon.
  7. 7. #SMX #12C3 @AdriaK I gotta be cruel to be kind.
  8. 8. #SMX #12C3 @AdriaK The smaller the difference, the bigger the sample you’ll need: 2% - 3% is a 50% increase 2.0%-2.5% is a 25% increase 2.0% - 2.25% is a 12.5% increase For standard A/B hypothesis tests
  9. 9. #SMX #12C3 @AdriaK Decide on How Much Impact Your Change Should Have Visits CR Orders AOV Revenue Annual Increase 20,000 2.00% 400 $50 $20,000 20,000 2.25% 450 $50 $22,500 $30,000 20,000 2.50% 500 $50 $25,000 $60,000 How much of a difference do you want to be able to detect with your test?
  10. 10. #SMX #12C3 @AdriaK “power analysis for two independent proportions” Pick a Sample Size Calculator minimum sample size we’re showing the variants to different visitors we’re comparing rates, which are proportions
  11. 11. #SMX #12C3 @AdriaK Is variation is higher or lower than the original? “two-tailed test.” A 5% significance level is common— that is, there’s a 5% chance of a false positive 80% statistical power is common— there is a 20% chance (1 in 5) that if there was an effect, we’d miss it. Calculator Options http://bit.ly/25zI5Rv P1 = your control CR, e.g. 0.02 for 2% P2= your likely test CR, e.g. 0.025 for 2.5%
  12. 12. #SMX #12C3 @AdriaK The effect of using 0.05 and 80% is that we are 4 times more likely to get a false negative than a false positive We’re more concerned about making things worse We accept a higher chance that won’t see a positive effect that is actually there Consequences of Significance and Power Choices
  13. 13. #SMX #12C3 @AdriaK Those are arbitrary choices. We’re not testing pharmaceuticals. Are we really so terrified that we’ll roll out a page that isn’t an improvement? NOBODY IS GOING TO DIE
  14. 14. #SMX #12C3 @AdriaK Means that I love you. Baby.
  15. 15. #SMX #12C3 @AdriaK Necessary Sample Sizes 1% change 13,8093,826 0.5% change 52,238
  16. 16. #SMX #12C3 @AdriaK Requires 52,238 Visits Detecting a 12.5% increase in Conversion Rate For each sample
  17. 17. #SMX #12C3 @AdriaK Photo by Marilynn Windust https://ronmitchelladventure.com
  18. 18. #SMX #12C3 @AdriaK You’re hoping for a 0.25% uplift on a 2.0% average CR. The Control is getting 2.0% CR, and the Variant is getting 3.0% CR! #2 Another Common (Sad) Story “Why haven’t we switched to the test variant? It’s CLEARLYWINNING.”
  19. 19. #SMX #12C3 @AdriaK So you test the significance level. Success! The difference is significant. You roll out the new page, and... ...nothing happens And this is how things go awry
  20. 20. #SMX #12C3 @AdriaK A significance calculation assumes that the sample size was fixed in advance It assumes that you have a valid sample So when you ignore this and run until you get a “significant result,” you’re misusing the math Why didn’t it work?
  21. 21. #SMX #12C3 @AdriaK If you hit a period that happens to be performing well You may succumb to the temptation to stop while you’re ahead Repeated significance testing increases the rate of false positives Friends don’t let friends test significance prematurely Image: Public Domain, via Wikipedia
  22. 22. #SMX #12C3 @AdriaK Why repeated significance testing is a problem
  23. 23. #SMX #12C3 @AdriaK 5% significance means that even if there is no difference between the test and the control We’ll see an imaginary difference in the test 5% of the time Remember what significance means?
  24. 24. #SMX #12C3 @AdriaK Repeated Significance Testing is The Devil Given: there is no actual difference between two test variants Option 1 Option 2 Option 3 Option 4 1st observation Significant No difference Significant No difference 2nd observation - Significant - No difference End ofTest Significant Significant Significant No difference Likelihood ? ? Option 1 Option 2 Option 3 Option 4 1st observation Significant No difference Significant No difference 2nd observation Significant Significant No difference No difference End ofTest Significant Significant No Difference No difference Likelihood 5% chance 95% chance Option 1 Option 2 Option 3 Option 4 1st observation Significant No difference Significant No difference 2nd observation - Significant - No difference End ofTest Significant Significant Significant No difference Likelihood 26% chance 74% chance Option 1 Option 2 1st observation Significant No difference Likelihood 5% chance 95% chance
  25. 25. #SMX #12C3 @AdriaK See the slippery slope in action! Day 1 Control 150 2.00% 2.01% Variant 175 2.25% 2.35% Visits/Variant 7,460 not Day 1 Day 2 Control 150 313 2.00% 2.01% 2.10% Variant 175 332 2.25% 2.35% 2.23% Visits/Variant 7,460 14,920 not not Day 1 Day 2 Day 3 Control 150 313 448 2.00% 2.01% 2.10% 2.00% Variant 175 332 498 2.25% 2.35% 2.23% 2.23% Visits/Variant 7,460 14,920 22,380 not not not Day 1 Day 2 Day 3 Day 4 Control 150 313 448 636 2.00% 2.01% 2.10% 2.00% 2.13% Variant 175 332 498 695 2.25% 2.35% 2.23% 2.23% 2.33% Visits/Variant 7,460 14,920 22,380 29,840 not not not not Day 1 Day 2 Day 3 Day 4 Day 5 Control 150 313 448 636 750 2.00% 2.01% 2.10% 2.00% 2.13% 2.01% Variant 175 332 498 695 835 2.25% 2.35% 2.23% 2.23% 2.33% 2.24% Visits/Variant 7,460 14,920 22,380 29,840 37,300 not not not not SIGNIFICANT Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 Control 150 313 448 636 750 922 2.00% 2.01% 2.10% 2.00% 2.13% 2.01% 2.06% Variant 175 332 498 695 835 993 2.25% 2.35% 2.23% 2.23% 2.33% 2.24% 2.22% Visits/Variant 7,460 14,920 22,380 29,840 37,300 44,760 not not not not SIGNIFICANT not Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 Day 7 Control 150 313 448 636 750 922 1098 2.00% 2.01% 2.10% 2.00% 2.13% 2.01% 2.06% 2.10% Variant 175 332 498 695 835 993 1174 2.25% 2.35% 2.23% 2.23% 2.33% 2.24% 2.22% 2.25% Visits/Variant 7,460 14,920 22,380 29,840 37,300 44,760 52,220 not not not not SIGNIFICANT not not
  26. 26. #SMX #12C3 @AdriaK Smart marketers PRE-COMMIT to a valid sample size And do not test for significance before they’ve collected it! Therefore:
  27. 27. #SMX #12C3 @AdriaK Because you have to be able to satisfy impatient observers But I neeeeeeed to test significance repeatedly!
  28. 28. #SMX #12C3 @AdriaK Solves the problem of repeated significance testing Allows you to stop the test early if the Variant is a winner Works with low conversion rates (under 10%) Sequential A/B Testing Image: http://geneticsandbeyond.blogspot.com/2014/08/the-puffinss-lair-sweat-of-hippos.html
  29. 29. #SMX #12C3 @AdriaK 1. Determine your sample size N (number of total conversions) 2. Measure the success of your Control and Variant groups 3. Check for stopping points  If Variant - Control = 2.25√N the Variant wins  If Control - Variant = 2.25√N the Control wins  If Variant + Control = N, there is no winner Sequential experiment design http://bit.ly/1sSDz29
  30. 30. #SMX #12C3 @AdriaK Sequential Sampling Calculator http://bit.ly/1TM1LKv
  31. 31. #SMX #12C3 @AdriaK Given a baseline conversion rate p Minimum detectable effect you want to see is d 1.5p + d < 36% When less than 36%, a sequential test will be shorter p = 2.0%, d = 12.5% (2.25% CR) 1.5p + d = 15.5% When to choose a fixed sample vs. a sequential test
  32. 32. #SMX #12C3 @AdriaK Variant CR = better than Control P-value = 0.18 (i.e. greater than our 0.05 significance level) When Good Math Leads to Bad Career Moves So how did the test go? Neither.We didn’t achieve significance. So which version won? We stopped this morning. So why did you stop it?! Just show it to another 10,000 visitors. We can’t do that.We have to accept that the test is over. This guy is not a team player. I am so screwed.. Well, the null hypothesis... blah blah blah Blah blah p-value blah blah hlah blah Image: 20th Century Fox viaAmazon
  33. 33. #SMX #12C3 @AdriaK Communicating results is hard. So which one performs better? There is a 95% probability that the results we saw are not due to random chance! Why can’t this guy just answer a straight question? I hate my life. Image: 20th Century Fox viaAmazon
  34. 34. #SMX #12C3 @AdriaK How to stop your test at any time and still make valid inferences!! Much easier to understand and explain the results!! Bayes’ Theorem Image via Wikipedia
  35. 35. #SMX #12C3 @AdriaK Frequentist Bayesian Assumes that there is no difference, and finds the probability that chance alone could have produced the experimental results seen Focuses on not getting Type I errors Most people don’t understand what the results mean What’s the Difference? Finds the probability that the test is better More forgiving of Type I errors Easier to understand and communicate to non-technical audiences
  36. 36. #SMX #12C3 @AdriaK Calculus Why Don’t Marketers Use Bayes’ Theorem? This formula determines is the probability that B will beat A in the long run.There’s a slightly different one if you have three test groups, etc.
  37. 37. #SMX #12C3 @AdriaK Online calculators are your friends! But Wait!
  38. 38. #SMX #12C3 @AdriaK Wins and losses data Graph • Probability distributions Table • Probability of being best • Spread of conversion rates Cool online Bayesian calculator http://bit.ly/24mKJaY
  39. 39. #SMX #12C3 @AdriaK 1. Decide on the probability you’re comfortable with 2. Decide how much variance you’re willing to accept How to use this calculator
  40. 40. #SMX #12C3 @AdriaK 96% probability that B is better But what’s the real CR? Needs more data High spread, less overlap
  41. 41. #SMX #12C3 @AdriaK Not very much CR variance But B is only 70% likely to be better Low spread, high overlap
  42. 42. #SMX #12C3 @AdriaK Variance of CR isn’t as bad Separation of peaks means that the CRs are different 94% probability that B is probably better We aren’t certain about the actual CR Less spread, less overlap Sample size is only 100 conversions each!
  43. 43. #SMX #12C3 @AdriaK You might actually see
  44. 44. #SMX #12C3 @AdriaK Allows you to start the test with some assumptions, called “priors” Can include: • the prior success probability (our belief about the average conversion rate) • How much variance you expect Bayesian’s interesting twist
  45. 45. #SMX #12C3 @AdriaK 1. Set your “priors” 2. Input your test data 3. Get back the probability that the test variant performs better Different cool Bayesian calculator http://bit.ly/1Wzrtro
  46. 46. #SMX #12C3 @AdriaK Actual, Understandable Results
  47. 47. #SMX #12C3 @AdriaK You can make inferences from low traffic and low conversions When someone says "What's the probability that the new page outperforms the old one?", you can give them an answer! Advantage of Bayesian results
  48. 48. #SMX #12C3 @AdriaK 1. You know how not to run a fixed sample test 2. You know you can run a sequential sample test when you need ongoing information about the results 3. You know how to run a Bayesian test, where you can keep checking your progress AND explain the results easily So now what?
  49. 49. #SMX #12C3 @AdriaK Are you trying to detect a big difference, or a small difference? Use the formula 1.5p + d big difference - use a normal fixed sample test (>36%) small difference - use a sequential test (< 36%) Do the people you report to get confused or unhappy when you try to explain significance and p-values to them? Run a Bayesian test Review: How to Design your Experiment
  50. 50. #SMX #12C3 @AdriaK Tests using significance Bayesian test 1. Use a sample calculator 2. Run the test for the specified sample 3. Profit! So That’s It, Then? 1. Decide how solid you want your probability estimate to be 2. Run the test and update the data 3. Profit!
  51. 51. #SMX #12C3 @AdriaK I’m all about the tough love.
  52. 52. #SMX #12C3 @AdriaK We are not measuring consistent user groups • Time of day • Day of week • Seasonality • Sales The Problem of Illusory Lift
  53. 53. #SMX #12C3 @AdriaK Run your tests long enough to cover at least one entire traffic/conversion cycle Monday-Sunday or equivalent full week Account for business cycles
  54. 54. #SMX #12C3 @AdriaK Daily differences in performance
  55. 55. #SMX #12C3 @AdriaK Don’t run your test too long Visitors delete their cookies and will pollute your samples Account for user behavior
  56. 56. #SMX #12C3 @AdriaK Nearly 40 percent of Internet users delete cookies from their primary computers on at least a monthly basis 53 percent delete cookies, cache or browsing history to help protect their privacy online It’s probably more than you think JupiterResearch 2005 TRUSTe/NationalCyber Security Alliance U.S.Consumer Privacy Index January 2016
  57. 57. #SMX #12C3 @AdriaK • Pre-commit to a sample size/experimental design • Fixed Sample A/B testing – no peeking before it’s done • Sequential A/B testing – built-in peeking • Bayesian – easier to understand the results • Collect samples for a full business cycle, but not too long Summary
  58. 58. #SMX #12C3 @AdriaK Fixed sample calculator Stats Dept., U of British Columbia http://bit.ly/25zI5Rv Sequential sampling calculator Evan Miller http://bit.ly/1TM1LKv Simple Bayesian calculator Peak Conversion http://bit.ly/24mKJaY Bayesian calculator with priors Lyst http://bit.ly/1Wzrtro Calculators I used
  59. 59. #SMX #12C3 @AdriaK LEARN MORE: UPCOMING @SMX EVENTS THANK YOU! SEE YOU AT THE NEXT #SMX

×