Successfully reported this slideshow.
Upcoming SlideShare
×

# How to Keep Your Gains from A/B Tests Without Accidentally Killing Them Later

5,461 views

Published on

Limits of A/B Tests
A/B tests don’t give you perfect decisions.
No matter what you do, you’re never 100% certain
If we’re not careful, winners aren’t really winners
Your conversions go up… and then they come back down
The Standard Solution
Run your test until you hit 95% statistical signiﬁcance.
Go to getdatadriven.com if you need a signiﬁcance calculator.
Martin Goodson’s PDF on poor testing methods: kiss.ly/bad-testing
This gives us the best data but not necessarily the best ROI.
So how far do we take this?
Simulation Time!
We modeled several A/B testing strategies. Using Monte Carlo simulations, we tested diﬀerent strategies over 1 million observations (people).
Will Kurt gets full credit for all this. @willkurt
1 Pick the minimal improvement The Scientist: 2 Determine your sample size 3 Determine degree of certainty (95%) 4 Start test but don’t check it early 5 If results aren’t signiﬁcant, keep control
Results for the Scientist:
1 Waits until 80% signiﬁcance The Reckless Marketer: 2 Calls a winner as soon as 80% gets hit
Results for the Reckless Marketer:
1 Waits for 95% signiﬁcance The Impatient Marketer: 2 Moves on to the next test after 500 people
Results for the Impatient Marketer:
The Realist 1 Waits for 99% signiﬁcance 2 Moves on to the next test after 2,000 people
Results for the Realist:
The Persistent Realist 1 Waits for 99% signiﬁcance 2 Moves on to the next test after 20,000 people
Results for the Persistent Realist:
The Blitz Realist 1 Waits for 99% signiﬁcance 2 Moves on to the next test after 200 people
Results for the Blitz Realist:
Let’s compare them using the area under the curve.
Don’t make decisions at less than 95% signiﬁcance.
You’ll waste all the time you spend testing
1 Be a scientist at 95% We have 3 viable strategies for making this work: 2 Only make changes at 99% 3 Sloppy 95% but make it up in volume
1 Pick the minimal improvement Be a scientist when you have lots of data and resources 2 Determine your sample size 3 Determine degree of certainty (95% 4 Start test but don’t check it early 5 If results aren’t signiﬁcant, keep control
If you don’t have the data or resources to be a scientist, go fast at 99%.
And if you still want to play at 95% without being a scientist, never stop testing.
How We A/B Test
First, get volume to 4000+ people/month.
Only make changes at 99% signiﬁcance.
Let the test run at least 1 week before checking results.
If not at 99% after two weeks, launch the next test.
If the next test isn’t ready, let it keep running while you build the next one.
The KISSmetrics A/B Testing Strategy 1 Get to 4,000 people/month for test 2 Only change the control if you reach 99% 3 Check results after 1 week 4 Launch the next test at 2 weeks 5 Let old tests run if you’re still building
This strategy isn’t perfect. It’s a balance between good data and speed.

Published in: Marketing, Technology, Education
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

### How to Keep Your Gains from A/B Tests Without Accidentally Killing Them Later

1. 1. Lars Lofgren and Will Kurt Keep Your Gains from A/B Tests Without Killing Them Later May 2014
2. 2. @larslofgren Hit me up
3. 3. 1 The limits of A/B tests We’ll cover… 2 The standard solutions 3 Simulations! Woohoo! #KISSwebinar 4 The 3 strategies of A/B testing that work 5 How we A/B test at KISSmetrics
4. 4. WATCH WEBINAR RECORDING NOW
5. 5. Limits of A/B Tests
6. 6. A/B tests don’t give you perfect decisions.
7. 7. No ma er what you do, you’re never 100% certain
8. 8. If we’re not careful, winners aren’t really winners
9. 9. Your conversions go up… and then they come back down
10. 10. The Standard Solution
11. 11. Run your test until you hit 95% statistical signiﬁcance.
12. 12. Go to getdatadriven.com if you need a signiﬁcance calculator.
13. 13. 1 Pick the minimal improvement Scientiﬁc A/B testing: 2 Determine your sample size 3 Determine degree of certainty (95%) #KISSwebinar 4 Start test but don’t check it early 5 If results aren’t signiﬁcant, keep control
14. 14. Martin Goodson’s PDF on poor testing methods: kiss.ly/bad-testing
15. 15. This gives us the best data but not necessarily the best ROI.
16. 16. So how far do we take this?
17. 17. Simulation Time!
18. 18. We modeled several A/B testing strategies. Using Monte Carlo simulations, we tested diﬀerent strategies over 1 million observations (people).
19. 19. Will Kurt gets full credit for all this. @willkurt
20. 20. 1 Pick the minimal improvement The Scientist: 2 Determine your sample size 3 Determine degree of certainty (95%) #KISSwebinar 4 Start test but don’t check it early 5 If results aren’t signiﬁcant, keep control
21. 21. Results for the Scientist:
22. 22. 1 Waits until 80% signiﬁcance The Reckless Marketer: #KISSwebinar 2 Calls a winner as soon as 80% gets hit
23. 23. Results for the Reckless Marketer:
24. 24. 1 Waits for 95% signiﬁcance The Impatient Marketer: #KISSwebinar 2 Moves on to the next test a er 500 people
25. 25. Results for the Impatient Marketer:
26. 26. The Realist #KISSwebinar 1 Waits for 99% signiﬁcance 2 Moves on to the next test a er 2,000 people
27. 27. Results for the Realist:
28. 28. The Persistent Realist #KISSwebinar 1 Waits for 99% signiﬁcance 2 Moves on to the next test a er 20,000 people
29. 29. Results for the Persistent Realist:
30. 30. The Blitz Realist #KISSwebinar 1 Waits for 99% signiﬁcance 2 Moves on to the next test a er 200 people
31. 31. Results for the Blitz Realist:
32. 32. Let’s compare them using the area under the curve.
33. 33. A/B Strategy Scores Strategy Conditions Score Scientist Stats like a pro 67759 Reckless Marketer 80% 57649 Impatient Marketer 95% and 500 people 60532 Realist 99% and 2,000 people 67896 Persistent Realist 99% and 20,000 people 68346 Blitz Realist 99% and 200 people 62836 No Testing Testing? NOPE! 50000 Each score is the area under the curve from the simulation. The higher the score, the more conversions you received.
34. 34. 0 17500 35000 52500 70000 Persistent Realist Realist Scientist Blitz Realist Impatient Reckless No Testing 50,000 57,649 60,532 62,836 67,75967,89668,346 A/B Strategy Scores
36. 36. 3 Strategies
37. 37. Don’t make decisions at less than 95% signiﬁcance.
38. 38. You’ll waste all the time you spend testing
39. 39. 1 Be a scientist at 95% We have 3 viable strategies for making this work: 2 Only make changes at 99% 3 Sloppy 95% but make it up in volume #KISSwebinar
40. 40. 1 Pick the minimal improvement Be a scientist when you have lots of data and resources 2 Determine your sample size 3 Determine degree of certainty (95%) #KISSwebinar 4 Start test but don’t check it early 5 If results aren’t signiﬁcant, keep control
41. 41. If you don’t have the data or resources to be a scientist, go fast at 99%.
42. 42. And if you still want to play at 95% without being a scientist, never stop testing.
43. 43. How We A/B Test
44. 44. First, get volume to 4000+ people/month.
45. 45. Only make changes at 99% signiﬁcance.
46. 46. Let the test run at least 1 week before checking results.
47. 47. If not at 99% a er two weeks, launch the next test.
48. 48. If the next test isn’t ready, let it keep running while you build the next one.
49. 49. The KISSmetrics A/B Testing Strategy 1 Get to 4,000 people/month for test 2 Only change the control if you reach 99% 3 Check results a er 1 week 4 Launch the next test at 2 weeks 5 Let old tests run if you’re still building
50. 50. This strategy isn’t perfect. It’s a balance between good data and speed.
51. 51. 1 Be a scientist at 95% Remember the 3 strategies: 2 Only make changes at 99% 3 Sloppy 95% but make it up in volume #KISSwebinar
52. 52. Q&A Time! Lars Lofgren @larslofgren llofgren@kissmetrics.com