Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Keynote Ton Wesseling at Superweek 2020: How an analyst can add value!

628 views

Published on

How an analyst can add value to your A/B-testing program.

What should be done with the A/B-test program?
A: Increase budgets (More a/b-tests (quantity))
B: Increase knowledge (Better a/b-tests (quality))
C: Decrease budgets (Less a/b-tests (quantity))

As an analyst:
You can calculate the answer
You have a big influence on the outcome

Published in: Data & Analytics
  • Be the first to comment

Keynote Ton Wesseling at Superweek 2020: How an analyst can add value!

  1. 1. Ton Wesseling Jan 27 - 31, 2020 How an analyst can add value! Digital Experiments
  2. 2. TON@ONLINEDIALOGUE.COM
  3. 3. TON@ONLINEDIALOGUE.COM User Growth by Wilson Joseph for the Noun Project Business Science!
  4. 4. TON@ONLINEDIALOGUE.COM Internet by Cindy Hu for the Noun Project Internet / websites!
  5. 5. TON@ONLINEDIALOGUE.COM Data Analyst by Five by Five for the Noun Project Data Analyst!
  6. 6. TON@ONLINEDIALOGUE.COM A/B-test by By Evangeline White for the Noun Project A/B-testing!
  7. 7. TON@ONLINEDIALOGUE.COM Everyone seems to like it!
  8. 8. TON@ONLINEDIALOGUE.COM 2018 research by OpLmizely A/B-testing Culture!
  9. 9. TON@ONLINEDIALOGUE.COM Hierarchy of evidence pyramid!
  10. 10. TON@ONLINEDIALOGUE.COM A/B-tesLng mastery course This talk mostly makes sense if you have 10.000 transactions or more per month – enough to get experimentation in the DNA of your organization.!
  11. 11. TON@ONLINEDIALOGUE.COM “Our success at Amazon is a function of how many experiments we do per year, per month, per week, per day…” Jeff Bezos, CEO Amazon
  12. 12. TON@ONLINEDIALOGUE.COM
  13. 13. TON@ONLINEDIALOGUE.COM Data Analyst by Five by Five for the Noun Project Data Analyst! In the new world it’s the companies that have lots of data and know how to properly use it that outperform the competition
  14. 14. TON@ONLINEDIALOGUE.COM What should be done with the A/B-test program?! A.  Increase budgets! •  More a/b-tests (quantity)! ! B.  Increase knowledge! •  Better a/b-tests (quality)! ! C.  Decrease budgets! •  Less a/b-tests (quantity)!
  15. 15. TON@ONLINEDIALOGUE.COM This should always be the answer! A.  Increase budgets! •  More a/b-tests (quantity)! But in reality it’s different...! ü  You can calculate the answer! ü  You have a big influence on the outcome!
  16. 16. TON@ONLINEDIALOGUE.COM DEF! The task of an analyst within an A/B-testing Culture! 1.  Data! 2.  Effectiveness! 3.  Finance!
  17. 17. TON@ONLINEDIALOGUE.COM DEF! The task of an analyst within an A/B-testing Culture! 1.  Data! 2.  Effectiveness! 3.  Finance!
  18. 18. TON@ONLINEDIALOGUE.COM Data! Let there be high quality data!
  19. 19. TON@ONLINEDIALOGUE.COM Make sure all funnels are measured…!
  20. 20. TON@ONLINEDIALOGUE.COM Make sure your testing solution has all users! Users on template: 42186! Users in the tool: 37652! Users with code executed: 34312 ! 100%! 89%! 81%!
  21. 21. TON@ONLINEDIALOGUE.COM What if my experiments had 20% more users?!
  22. 22. TON@ONLINEDIALOGUE.COM Recognizing returning users!
  23. 23. TON@ONLINEDIALOGUE.COM Recognizing returning users! Buddhini S. on Jargon Wall
  24. 24. TON@ONLINEDIALOGUE.COM Be able to segment on page interactions!
  25. 25. TON@ONLINEDIALOGUE.COM Be able to segment on who can be influenced!
  26. 26. TON@ONLINEDIALOGUE.COM Be able to create behavioral segments! Typical ecommerce flow example: ü  All users on your website with enough time to take action ü  All users on your website with at least some interaction ü  All users on your website with heavy interaction ü  All users on your website with clear intent to buy ü  All users on your website that are willing to buy ü  All users on your website that succeed in buying ü  All users on your website that return with intent to buy more Funnel + Average Lme
  27. 27. TON@ONLINEDIALOGUE.COM Scientific method
  28. 28. TON@ONLINEDIALOGUE.COM Data! Let there be high quality data!
  29. 29. TON@ONLINEDIALOGUE.COM DEF! The task of an analyst within an A/B-testing Culture! 1.  Data! 2.  Effectiveness! 3.  Finance!
  30. 30. TON@ONLINEDIALOGUE.COM Effectiveness! Make sure you work on stuff! with the highest potential outcome!
  31. 31. TON@ONLINEDIALOGUE.COM Statistical Power! The likelihood that an experiment will detect an effect, when there is an effect there to be detected!
  32. 32. TON@ONLINEDIALOGUE.COM Power & Significance New version is NOT better New version is better New version is NOT better New version is better Measured Reality
  33. 33. TON@ONLINEDIALOGUE.COM Power & Significance Do not reject H0 Reject H0 H0 is true H0 is false Measured Reality
  34. 34. TON@ONLINEDIALOGUE.COM Significance Do not reject H0 Reject H0 H0 is true H0 is false Correct decision J Measured Reality
  35. 35. TON@ONLINEDIALOGUE.COM Significance Do not reject H0 Reject H0 H0 is true Type I False Positive (α) H0 is false Correct decision J Measured Reality
  36. 36. TON@ONLINEDIALOGUE.COM Power Do not reject H0 Reject H0 H0 is true Correct decision J Type I False Positive (α) H0 is false Correct decision J Measured Reality
  37. 37. TON@ONLINEDIALOGUE.COM Power Do not reject H0 Reject H0 H0 is true Correct decision J Type I False Positive (α) H0 is false Type II
 False Negative (β) Correct decision J Measured Reality
  38. 38. TON@ONLINEDIALOGUE.COM Power New version is NOT better New version is better New version is NOT better Correct decision J Type I False Positive (α) New version is better Type II
 False Negative (β) Correct decision J Measured Reality
  39. 39. TON@ONLINEDIALOGUE.COM Power & Significance rule of thumb Power When you start: try to test on pages with a high Power (>80%) à otherwise you don’t detect effects when there is an effect to be detected (False negatives). Significance When you start: try to test against a high enough significance level (90%) à otherwise you’ll declare winners, when in reality there isn’t an effect (False positives).
  40. 40. TON@ONLINEDIALOGUE.COM This looks good!
  41. 41. TON@ONLINEDIALOGUE.COM This is fascinating!
  42. 42. TON@ONLINEDIALOGUE.COM This makes me sad!
  43. 43. TON@ONLINEDIALOGUE.COM https://abtestguide.com/abtestsize/!
  44. 44. TON@ONLINEDIALOGUE.COM
  45. 45. TON@ONLINEDIALOGUE.COM https://ondi.me/bandwidth!
  46. 46. TON@ONLINEDIALOGUE.COM Prioritize based on MDE to start!
  47. 47. TON@ONLINEDIALOGUE.COM Test Power Determination DETERMINE UNIQUE WEEKLY VISITORS PER PAGE TYPE! ü  We run and evaluate A/B tests on the unique visitor metric: we want to influence unique users
  48. 48. TON@ONLINEDIALOGUE.COM Test Power Determination DETERMINE UNIQUE WEEKLY VISITORS PER PAGE TYPE! à Build a segment for each page type / segment / test platform combination
  49. 49. TON@ONLINEDIALOGUE.COM Test Power Determination DETERMINE UNIQUE WEEKLY VISITORS PER PAGE TYPE! à Look up the number of weekly visitors with this behavior (select multiple weeks and device by the number of weeks to account for fluctuation)
  50. 50. TON@ONLINEDIALOGUE.COM Test Power Determination DETERMINE UNIQUE VISITORS WITH A CONVERSION PER PAGE TYPE! ü  Visitors must have seen the test page before they converted € Converted
  51. 51. TON@ONLINEDIALOGUE.COM Test Power Determination DETERMINE UNIQUE VISITORS WITH A CONVERSION PER PAGE TYPE! à Build a 2nd sequential segment with page seen à converted
  52. 52. TON@ONLINEDIALOGUE.COM Test Power Determination DETERMINE UNIQUE VISITORS WITH A CONVERSION PER PAGE TYPE! à Look up the number of weekly visitors with a conversion (select multiple weeks and device by the number of weeks to account for fluctuation) à Make sure you don’t have sampled data. Otherwise select a shorter period
  53. 53. TON@ONLINEDIALOGUE.COM https://ondi.me/bandwidth!
  54. 54. TON@ONLINEDIALOGUE.COM Prioritize based on MDE to start!
  55. 55. TON@ONLINEDIALOGUE.COM Prioritize based on measured results!!
  56. 56. TON@ONLINEDIALOGUE.COM Prioritize based on measured results!!
  57. 57. TON@ONLINEDIALOGUE.COM Prioritize based on measured results!!
  58. 58. TON@ONLINEDIALOGUE.COM Prioritize based on measured results!!
  59. 59. TON@ONLINEDIALOGUE.COM Prioritize based on measured results!!
  60. 60. TON@ONLINEDIALOGUE.COM Prioritize based on measured results!! With real data from your program! your prioritization will change!!
  61. 61. TON@ONLINEDIALOGUE.COM
  62. 62. TON@ONLINEDIALOGUE.COM Type-M errors…
  63. 63. TON@ONLINEDIALOGUE.COM Prioritize based on measured results?! (100% - M-Type Error) of course! Low Power gives a higher Type-M error
  64. 64. TON@ONLINEDIALOGUE.COM Effectiveness! Make sure you work on stuff! with the highest potential outcome!
  65. 65. TON@ONLINEDIALOGUE.COM DEF! The task of an analyst within an A/B-testing Culture! 1.  Data! 2.  Effectiveness! 3.  Finance!
  66. 66. TON@ONLINEDIALOGUE.COM Finance! Business case calculations!
  67. 67. TON@ONLINEDIALOGUE.COM What does your calculation look like?! If significant result: ! Extra new customers per week! x! 52 weeks effective! x! Average lifetime value!
  68. 68. TON@ONLINEDIALOGUE.COM What does your calculation look like?! If significant result: ! Extra transactions per week! X! 26 weeks effective! x! Average order value!
  69. 69. TON@ONLINEDIALOGUE.COM So this experiment will bring us:! €232,840! (revenue in 6 months after implementation) Ø  And then just add up all the winners from the past year? Ø  Which makes €5,273,132 for the whole program? Ø  And devide that through the yearly costs of €623,400 Ø  So your ROI is: €8.46 revenue per €1 investment?
  70. 70. TON@ONLINEDIALOGUE.COM Implementing winners…!
  71. 71. TON@ONLINEDIALOGUE.COM
  72. 72. TON@ONLINEDIALOGUE.COM So that one experiment will bring us:! €232,840 * (100%-Type-M error %)?! ! (Yes, if it indeed is a true positive)! ! €232,840 * (100% - 12%) = €204,899
  73. 73. TON@ONLINEDIALOGUE.COM Let’s see if the result are already significant! Focusonpc via Pixabay
  74. 74. TON@ONLINEDIALOGUE.COM How NOT to shorten the length of your A/B-test! hSps://www.einarsen.no/is-your-ab-tesLng-effort-just-chasing-staLsLcal-ghosts/
  75. 75. TON@ONLINEDIALOGUE.COM How NOT to shorten the length of your A/B-test! hSps://www.evanmiller.org/how-not-to-run-an-ab-test.html
  76. 76. TON@ONLINEDIALOGUE.COM How to shorten the length of your A/B-test! hSps://codeascraV.com/2018/10/03/how-etsy-handles-peeking-in-a-b-tesLng/
  77. 77. TON@ONLINEDIALOGUE.COM How to shorten the length of your A/B-test! hSps://medium.com/convoy-tech/the-power-of-bayesian-a-b-tesLng-f859d2219d5
  78. 78. TON@ONLINEDIALOGUE.COM How to shorten the length of your A/B-test! hSps://booking.ai/how-booking-com-increases-the-power-of-online-experiments-with-cuped-995d186fff1d “CUPED tries to remove variance in a metric that can be accounted for by pre-experiment information”
  79. 79. TON@ONLINEDIALOGUE.COM You could even find more wins! hSps://booking.ai/how-booking-com-increases-the-power-of-online-experiments-with-cuped-995d186fff1d
  80. 80. TON@ONLINEDIALOGUE.COM SRM checks anybody? Also check Lukas vermeer at #CH2019: https://conversionhotel.com/session/keynote-2019-run-better-experiments-srm-checks/
  81. 81. TON@ONLINEDIALOGUE.COM Running experiment dashboard
  82. 82. TON@ONLINEDIALOGUE.COM Running experiment dashboard
  83. 83. TON@ONLINEDIALOGUE.COM Should I stop the experiment? ü  Is something broken? à YES! ü  Is there a SRM error? à YES! ü  Are we losing too much money? à YES! (and maybe a low chance of becoming significant if you can start a next experiment now)
  84. 84. TON@ONLINEDIALOGUE.COM Back to the calculation! €232,840 * (100%-Type-M error %)?! ! (Yes, if it indeed is a true positive)! ! €232,840 * (100% - 12%) = €204,899
  85. 85. TON@ONLINEDIALOGUE.COM Implementing winners…!
  86. 86. TON@ONLINEDIALOGUE.COM What is your False Discovery Rate?! Significance border: 90%! 100 experiments! 20 significant outcomes! ! 50%!* (it’s a little lower, this is the poor man’s calculation)! (with every real win the number of experiments without wins becomes lower, which leads to less false positives)!
  87. 87. TON@ONLINEDIALOGUE.COM So not really 50%! FDR* = (Measured Wins - ((Measured Wins - ! ((100% - Confidence Level) * Experiments))! / Confidence Level)) / Measured Wins! ! =! ! (20 – ((20 – ((100% - 90%) * 100)) / 90%)) / 20! ! =! ! 44%!* (only if your power on all experiments was 100%)! (Your Power will be lower, which means you had more real wins, but not measured (false negatives).! This leads to less experiments without an effect, so the number of false positives will be even lower)!
  88. 88. TON@ONLINEDIALOGUE.COM Rule of thumb: once you have 10 winners or more! You can calculate your True Discovery Rate Power(Winners+Significance-1) Winners(Power+Significance-1) 80%*(20%+90%-1) = 0.08 20%*(80%+90%-1) = 0.14 = 57,14%
  89. 89. TON@ONLINEDIALOGUE.COM https://abtestguide.com/fdr/! FDR / TDR calculator!
  90. 90. TON@ONLINEDIALOGUE.COM FDR / TDR calculator!
  91. 91. TON@ONLINEDIALOGUE.COM So all your experiments will bring you:! Sum of! (every winner x (100% - Type-M error % per winner))! ! X!! True Discovery Rate! x! Implementation % (within x months…)! (assuming every new win is tested on the new default where all earlier wins are implemented)!
  92. 92. TON@ONLINEDIALOGUE.COM So all your experiments will bring you:! €5,273,132 x (100%-12% average Type-M)! ! X!! 57,14%! =! €2,651,500!
  93. 93. TON@ONLINEDIALOGUE.COM Maximize your growth within your ROI limit:! Value of A/B-testing for Optimization! ! Costs of A/B-testing for Optimization! = ROI!
  94. 94. TON@ONLINEDIALOGUE.COM Are you above or below your ROI limit?! 1.  Above: increase budgets! 2.  Below: increase knowledge! 3.  Still below: decrease budgets!
  95. 95. TON@ONLINEDIALOGUE.COM Are you above or below your ROI limit?! ①  Above: Increase budgets! •  More a/b-tests (quantity)! •  Lower win%, more winners ②  Below: Increase knowledge! •  Better a/b-tests (quality)! •  Higher win%, more winners ! ③  Still below: Decrease budgets! •  Less a/b-tests (quantity)! •  Higher win%, less winners
  96. 96. TON@ONLINEDIALOGUE.COM You can help getting to this answer! A.  Increase budgets! •  More a/b-tests (quantity)! ü  You can calculate the answer! ü  You have a big influence on the outcome!
  97. 97. TON@ONLINEDIALOGUE.COM Data Analyst - The Noun Project icon from the Noun Project An A/B-testing for growth analyst:! 1.  Makes sure there is high quality Data available! 2.  Steers the data chance on Effect! 3.  Reports on the real Financial impact!
  98. 98. TON@ONLINEDIALOGUE.COM
  99. 99. Ton Wesseling https://ondi.me/tonw Let’s connect on LinkedIn Latest article on A/B-testing:
  100. 100. Ton Wesseling Jan 27 - 31, 2020 How an analyst can add value! Digital Experiments

×